All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-22 18:53   ` Matthew Wilcox
@ 2016-09-22 18:09     ` Linus Torvalds
  -1 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-22 18:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, Andrew Morton, Konstantin Khlebnikov,
	Ross Zwisler, Linux Kernel Mailing List, linux-mm, linux-fsdevel,
	Matthew Wilcox

On Thu, Sep 22, 2016 at 11:53 AM, Matthew Wilcox
<mawilcox@linuxonhyperv.com> wrote:
>
>           Change the test suite to compile with -O2, and
> fix the optimisation problem by passing 'entry' through entry_to_node()
> so gcc knows this isn't a plain pointer.

Ugh. I really don't like this patch very much.

Wouldn't it be cleaner to just fix "get_slot_offset()" instead? As it
is, looking at the code, I suspect that it's really hard to convince
people that there isn't some other place this might happen. Because
the "pointer subtraction followed by pointer addition" pattern is all
hidden in these inline functions.

Or at least add a big comment about why this is the only such case.

Because without that, the code now looks very bad.

                   Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-22 18:09     ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-22 18:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, Andrew Morton, Konstantin Khlebnikov,
	Ross Zwisler, Linux Kernel Mailing List, linux-mm, linux-fsdevel,
	Matthew Wilcox

On Thu, Sep 22, 2016 at 11:53 AM, Matthew Wilcox
<mawilcox@linuxonhyperv.com> wrote:
>
>           Change the test suite to compile with -O2, and
> fix the optimisation problem by passing 'entry' through entry_to_node()
> so gcc knows this isn't a plain pointer.

Ugh. I really don't like this patch very much.

Wouldn't it be cleaner to just fix "get_slot_offset()" instead? As it
is, looking at the code, I suspect that it's really hard to convince
people that there isn't some other place this might happen. Because
the "pointer subtraction followed by pointer addition" pattern is all
hidden in these inline functions.

Or at least add a big comment about why this is the only such case.

Because without that, the code now looks very bad.

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/2] Fix radix_tree_lookup_slot()
@ 2016-09-22 18:53 ` Matthew Wilcox
  0 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-22 18:53 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Hi Linus,

Please apply for 4.8.  The same bug is also present in 4.7, but is
probably latent.

Matthew Wilcox (2):
  radix tree test suite: Test radix_tree_replace_slot() for multiorder
    entries
  radix-tree: Fix optimisation problem

 lib/radix-tree.c                      |  3 ++-
 tools/testing/radix-tree/Makefile     |  2 +-
 tools/testing/radix-tree/multiorder.c | 16 ++++++++++++----
 3 files changed, 15 insertions(+), 6 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/2] Fix radix_tree_lookup_slot()
@ 2016-09-22 18:53 ` Matthew Wilcox
  0 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-22 18:53 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Hi Linus,

Please apply for 4.8.  The same bug is also present in 4.7, but is
probably latent.

Matthew Wilcox (2):
  radix tree test suite: Test radix_tree_replace_slot() for multiorder
    entries
  radix-tree: Fix optimisation problem

 lib/radix-tree.c                      |  3 ++-
 tools/testing/radix-tree/Makefile     |  2 +-
 tools/testing/radix-tree/multiorder.c | 16 ++++++++++++----
 3 files changed, 15 insertions(+), 6 deletions(-)

-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/2] radix tree test suite: Test radix_tree_replace_slot() for multiorder entries
  2016-09-22 18:53 ` Matthew Wilcox
@ 2016-09-22 18:53   ` Matthew Wilcox
  -1 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-22 18:53 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

When we replace a multiorder entry, check that all indices reflect the
new value.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/multiorder.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 39d9b95..05d7bc4 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -124,6 +124,8 @@ static void multiorder_check(unsigned long index, int order)
 	unsigned long i;
 	unsigned long min = index & ~((1UL << order) - 1);
 	unsigned long max = min + (1UL << order);
+	void **slot;
+	struct item *item2 = item_create(min);
 	RADIX_TREE(tree, GFP_KERNEL);
 
 	printf("Multiorder index %ld, order %d\n", index, order);
@@ -139,13 +141,19 @@ static void multiorder_check(unsigned long index, int order)
 		item_check_absent(&tree, i);
 	for (i = max; i < 2*max; i++)
 		item_check_absent(&tree, i);
+	for (i = min; i < max; i++)
+		assert(radix_tree_insert(&tree, i, item2) == -EEXIST);
+
+	slot = radix_tree_lookup_slot(&tree, index);
+	free(*slot);
+	radix_tree_replace_slot(slot, item2);
 	for (i = min; i < max; i++) {
-		static void *entry = (void *)
-					(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY);
-		assert(radix_tree_insert(&tree, i, entry) == -EEXIST);
+		struct item *item = item_lookup(&tree, i);
+		assert(item != 0);
+		assert(item->index == min);
 	}
 
-	assert(item_delete(&tree, index) != 0);
+	assert(item_delete(&tree, min) != 0);
 
 	for (i = 0; i < 2*max; i++)
 		item_check_absent(&tree, i);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 1/2] radix tree test suite: Test radix_tree_replace_slot() for multiorder entries
@ 2016-09-22 18:53   ` Matthew Wilcox
  0 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-22 18:53 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

When we replace a multiorder entry, check that all indices reflect the
new value.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/multiorder.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 39d9b95..05d7bc4 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -124,6 +124,8 @@ static void multiorder_check(unsigned long index, int order)
 	unsigned long i;
 	unsigned long min = index & ~((1UL << order) - 1);
 	unsigned long max = min + (1UL << order);
+	void **slot;
+	struct item *item2 = item_create(min);
 	RADIX_TREE(tree, GFP_KERNEL);
 
 	printf("Multiorder index %ld, order %d\n", index, order);
@@ -139,13 +141,19 @@ static void multiorder_check(unsigned long index, int order)
 		item_check_absent(&tree, i);
 	for (i = max; i < 2*max; i++)
 		item_check_absent(&tree, i);
+	for (i = min; i < max; i++)
+		assert(radix_tree_insert(&tree, i, item2) == -EEXIST);
+
+	slot = radix_tree_lookup_slot(&tree, index);
+	free(*slot);
+	radix_tree_replace_slot(slot, item2);
 	for (i = min; i < max; i++) {
-		static void *entry = (void *)
-					(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY);
-		assert(radix_tree_insert(&tree, i, entry) == -EEXIST);
+		struct item *item = item_lookup(&tree, i);
+		assert(item != 0);
+		assert(item->index == min);
 	}
 
-	assert(item_delete(&tree, index) != 0);
+	assert(item_delete(&tree, min) != 0);
 
 	for (i = 0; i < 2*max; i++)
 		item_check_absent(&tree, i);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-22 18:53 ` Matthew Wilcox
@ 2016-09-22 18:53   ` Matthew Wilcox
  -1 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-22 18:53 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

When compiling the radix tree with -O2, GCC thinks it can optimise:

	void *entry = parent->slots[offset];
	int siboff = entry - parent->slots;
	void *slot = parent->slots + siboff;

into

	void *slot = entry;

Unfortunately, 'entry' is a tagged pointer, so this optimisation leads
to getting an unaligned pointer back from radix_tree_lookup_slot().
The test suite wasn't being compiled with optimisation, so we hadn't
spotted it before now.  Change the test suite to compile with -O2, and
fix the optimisation problem by passing 'entry' through entry_to_node()
so gcc knows this isn't a plain pointer.
---
 lib/radix-tree.c                  | 3 ++-
 tools/testing/radix-tree/Makefile | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf73..8bf1f32 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -105,7 +105,8 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
+		unsigned long siboff = get_slot_offset(parent,
+						(void **)entry_to_node(entry));
 		if (siboff < RADIX_TREE_MAP_SIZE) {
 			offset = siboff;
 			entry = rcu_dereference_raw(parent->slots[offset]);
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 3b53046..9d0919ed 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -1,5 +1,5 @@
 
-CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
+CFLAGS += -I. -g -O2 -Wall -D_LGPL_SOURCE
 LDFLAGS += -lpthread -lurcu
 TARGETS = main
 OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-22 18:53   ` Matthew Wilcox
  0 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-22 18:53 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

When compiling the radix tree with -O2, GCC thinks it can optimise:

	void *entry = parent->slots[offset];
	int siboff = entry - parent->slots;
	void *slot = parent->slots + siboff;

into

	void *slot = entry;

Unfortunately, 'entry' is a tagged pointer, so this optimisation leads
to getting an unaligned pointer back from radix_tree_lookup_slot().
The test suite wasn't being compiled with optimisation, so we hadn't
spotted it before now.  Change the test suite to compile with -O2, and
fix the optimisation problem by passing 'entry' through entry_to_node()
so gcc knows this isn't a plain pointer.
---
 lib/radix-tree.c                  | 3 ++-
 tools/testing/radix-tree/Makefile | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf73..8bf1f32 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -105,7 +105,8 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
+		unsigned long siboff = get_slot_offset(parent,
+						(void **)entry_to_node(entry));
 		if (siboff < RADIX_TREE_MAP_SIZE) {
 			offset = siboff;
 			entry = rcu_dereference_raw(parent->slots[offset]);
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 3b53046..9d0919ed 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -1,5 +1,5 @@
 
-CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
+CFLAGS += -I. -g -O2 -Wall -D_LGPL_SOURCE
 LDFLAGS += -lpthread -lurcu
 TARGETS = main
 OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* RE: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-22 18:09     ` Linus Torvalds
  (?)
@ 2016-09-23 20:16     ` Matthew Wilcox
  2016-09-24 20:21         ` Linus Torvalds
  -1 siblings, 1 reply; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-23 20:16 UTC (permalink / raw)
  To: Linus Torvalds, Matthew Wilcox
  Cc: Kirill A. Shutemov, Andrew Morton, Konstantin Khlebnikov,
	Ross Zwisler, Linux Kernel Mailing List, linux-mm, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3327 bytes --]

From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
> On Thu, Sep 22, 2016 at 11:53 AM, Matthew Wilcox
> <mawilcox@linuxonhyperv.com> wrote:
> >
> >           Change the test suite to compile with -O2, and
> > fix the optimisation problem by passing 'entry' through entry_to_node()
> > so gcc knows this isn't a plain pointer.
> 
> Ugh. I really don't like this patch very much.
> 
> Wouldn't it be cleaner to just fix "get_slot_offset()" instead? As it
> is, looking at the code, I suspect that it's really hard to convince
> people that there isn't some other place this might happen. Because
> the "pointer subtraction followed by pointer addition" pattern is all
> hidden in these inline functions.
> 
> Or at least add a big comment about why this is the only such case.
> 
> Because without that, the code now looks very bad.

That's fair.  I looked at all the other callers of get_slot_offset, and all the others are using a real slot pointer.  radix_tree_descend() really is the outlier here.  I think the real problem is that the types in the tree are wrong; instead of storing void *, we should be storing uintptr_t.  But fixing that is a little beyond the scope of -rc8.  Here's a slightly better version which asserts that the passed pointer really is a pointer.

(attached as well, I have no idea whether this patch will get mangled)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf73..368f641 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -91,9 +91,15 @@ static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
 }
 #endif
 
+/*
+ * The slot pointer must be a real pointer as GCC will optimise
+ * through inlined functions and may deduce that
+ * parent->slots + get_slot_offset(parent, slot) == slot
+ */
 static inline unsigned long get_slot_offset(struct radix_tree_node *parent,
 						 void **slot)
 {
+	BUG_ON(radix_tree_exception(slot));
 	return slot - parent->slots;
 }
 
@@ -101,11 +107,12 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 			struct radix_tree_node **nodep, unsigned long index)
 {
 	unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
-	void **entry = rcu_dereference_raw(parent->slots[offset]);
+	void *entry = rcu_dereference_raw(parent->slots[offset]);
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
+		unsigned long siboff = get_slot_offset(parent,
+						(void **)entry_to_node(entry));
 		if (siboff < RADIX_TREE_MAP_SIZE) {
 			offset = siboff;
 			entry = rcu_dereference_raw(parent->slots[offset]);
@@ -113,7 +120,7 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 	}
 #endif
 
-	*nodep = (void *)entry;
+	*nodep = entry;
 	return offset;
 }
 
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 3b53046..9d0919ed 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -1,5 +1,5 @@
 
-CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
+CFLAGS += -I. -g -O2 -Wall -D_LGPL_SOURCE
 LDFLAGS += -lpthread -lurcu
 TARGETS = main
 OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \

[-- Attachment #2: for-linus.diff --]
[-- Type: application/octet-stream, Size: 1871 bytes --]

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf73..368f641 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -91,9 +91,15 @@ static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
 }
 #endif
 
+/*
+ * The slot pointer must be a real pointer as GCC will optimise
+ * through inlined functions and may deduce that
+ * parent->slots + get_slot_offset(parent, slot) == slot
+ */
 static inline unsigned long get_slot_offset(struct radix_tree_node *parent,
 						 void **slot)
 {
+	BUG_ON(radix_tree_exception(slot));
 	return slot - parent->slots;
 }
 
@@ -101,11 +107,12 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 			struct radix_tree_node **nodep, unsigned long index)
 {
 	unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
-	void **entry = rcu_dereference_raw(parent->slots[offset]);
+	void *entry = rcu_dereference_raw(parent->slots[offset]);
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
+		unsigned long siboff = get_slot_offset(parent,
+						(void **)entry_to_node(entry));
 		if (siboff < RADIX_TREE_MAP_SIZE) {
 			offset = siboff;
 			entry = rcu_dereference_raw(parent->slots[offset]);
@@ -113,7 +120,7 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 	}
 #endif
 
-	*nodep = (void *)entry;
+	*nodep = entry;
 	return offset;
 }
 
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 3b53046..9d0919ed 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -1,5 +1,5 @@
 
-CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
+CFLAGS += -I. -g -O2 -Wall -D_LGPL_SOURCE
 LDFLAGS += -lpthread -lurcu
 TARGETS = main
 OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-22 18:53   ` Matthew Wilcox
@ 2016-09-24  8:36     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2016-09-24  8:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Kirill A. Shutemov, Andrew Morton, Ross Zwisler,
	Linux Kernel Mailing List, linux-mm, linux-fsdevel,
	Matthew Wilcox

On Thu, Sep 22, 2016 at 9:53 PM, Matthew Wilcox
<mawilcox@linuxonhyperv.com> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> When compiling the radix tree with -O2, GCC thinks it can optimise:
>
>         void *entry = parent->slots[offset];
>         int siboff = entry - parent->slots;
>         void *slot = parent->slots + siboff;
>
> into
>
>         void *slot = entry;
>
> Unfortunately, 'entry' is a tagged pointer, so this optimisation leads
> to getting an unaligned pointer back from radix_tree_lookup_slot().
> The test suite wasn't being compiled with optimisation, so we hadn't
> spotted it before now.  Change the test suite to compile with -O2, and
> fix the optimisation problem by passing 'entry' through entry_to_node()
> so gcc knows this isn't a plain pointer.
> ---
>  lib/radix-tree.c                  | 3 ++-
>  tools/testing/radix-tree/Makefile | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 1b7bf73..8bf1f32 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -105,7 +105,8 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
>
>  #ifdef CONFIG_RADIX_TREE_MULTIORDER
>         if (radix_tree_is_internal_node(entry)) {
> -               unsigned long siboff = get_slot_offset(parent, entry);
> +               unsigned long siboff = get_slot_offset(parent,
> +                                               (void **)entry_to_node(entry));

As I see this is the only place where get_slot_offset used for
unaligned pointer.
Nobody uses "multiorder entries" so this never happens. And I have
plan to kill this code.

>                 if (siboff < RADIX_TREE_MAP_SIZE) {
>                         offset = siboff;
>                         entry = rcu_dereference_raw(parent->slots[offset]);
> diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
> index 3b53046..9d0919ed 100644
> --- a/tools/testing/radix-tree/Makefile
> +++ b/tools/testing/radix-tree/Makefile
> @@ -1,5 +1,5 @@
>
> -CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
> +CFLAGS += -I. -g -O2 -Wall -D_LGPL_SOURCE
>  LDFLAGS += -lpthread -lurcu
>  TARGETS = main
>  OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-24  8:36     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2016-09-24  8:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Kirill A. Shutemov, Andrew Morton, Ross Zwisler,
	Linux Kernel Mailing List, linux-mm, linux-fsdevel,
	Matthew Wilcox

On Thu, Sep 22, 2016 at 9:53 PM, Matthew Wilcox
<mawilcox@linuxonhyperv.com> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> When compiling the radix tree with -O2, GCC thinks it can optimise:
>
>         void *entry = parent->slots[offset];
>         int siboff = entry - parent->slots;
>         void *slot = parent->slots + siboff;
>
> into
>
>         void *slot = entry;
>
> Unfortunately, 'entry' is a tagged pointer, so this optimisation leads
> to getting an unaligned pointer back from radix_tree_lookup_slot().
> The test suite wasn't being compiled with optimisation, so we hadn't
> spotted it before now.  Change the test suite to compile with -O2, and
> fix the optimisation problem by passing 'entry' through entry_to_node()
> so gcc knows this isn't a plain pointer.
> ---
>  lib/radix-tree.c                  | 3 ++-
>  tools/testing/radix-tree/Makefile | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 1b7bf73..8bf1f32 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -105,7 +105,8 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
>
>  #ifdef CONFIG_RADIX_TREE_MULTIORDER
>         if (radix_tree_is_internal_node(entry)) {
> -               unsigned long siboff = get_slot_offset(parent, entry);
> +               unsigned long siboff = get_slot_offset(parent,
> +                                               (void **)entry_to_node(entry));

As I see this is the only place where get_slot_offset used for
unaligned pointer.
Nobody uses "multiorder entries" so this never happens. And I have
plan to kill this code.

>                 if (siboff < RADIX_TREE_MAP_SIZE) {
>                         offset = siboff;
>                         entry = rcu_dereference_raw(parent->slots[offset]);
> diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
> index 3b53046..9d0919ed 100644
> --- a/tools/testing/radix-tree/Makefile
> +++ b/tools/testing/radix-tree/Makefile
> @@ -1,5 +1,5 @@
>
> -CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
> +CFLAGS += -I. -g -O2 -Wall -D_LGPL_SOURCE
>  LDFLAGS += -lpthread -lurcu
>  TARGETS = main
>  OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \
> --
> 2.9.3
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-23 20:16     ` Matthew Wilcox
@ 2016-09-24 20:21         ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-24 20:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	linux-mm, linux-fsdevel

On Fri, Sep 23, 2016 at 1:16 PM, Matthew Wilcox <mawilcox@microsoft.com> wrote:
>
>  #ifdef CONFIG_RADIX_TREE_MULTIORDER
>         if (radix_tree_is_internal_node(entry)) {
> -               unsigned long siboff = get_slot_offset(parent, entry);
> +               unsigned long siboff = get_slot_offset(parent,
> +                                               (void **)entry_to_node(entry));

I feel that it is *this* part that I think needs a huge honking comment.

If you are going to make get_slot_offset() different, then you could
just rewrite get_slot_offset() to do

        unsigned long diff = (unsigned long) slot - (unsigned
long)parent->slots;
        return diff / sizeof(void *);

and add a comment to say "don't do this as a pointer diff, because
'slot' may not be an aligned pointer". No BUG_ON() necessary, because
it "just works".

At that point, gcc should just generate the right code, because it
doesn't see it as a pointer subtraction followed by a pointer
addition.

And yes, that crazy " (void **)entry_to_node(entry)" fixes it *too*,
but it needs a *comment*.

Why is that special, when all the other uses of get_slot_offset()
don't have that? *That* is what should be explained. Not some internal
detail.

That said, if this code isn't even used, as Konstantin says (THP
selects it - doesn't THP use it?), then the fix really should be to
just remove the odd code instead of adding to it.

Looking around for uses that set "order" to anything but zero, I
really don't see it. So maybe we should just do *that* trivial thing
instead, and remove CONFIG_RADIX_TREE_MULTIORDER, since it's appears
to be buggy and always has been.

                  Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-24 20:21         ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-24 20:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	linux-mm, linux-fsdevel

On Fri, Sep 23, 2016 at 1:16 PM, Matthew Wilcox <mawilcox@microsoft.com> wrote:
>
>  #ifdef CONFIG_RADIX_TREE_MULTIORDER
>         if (radix_tree_is_internal_node(entry)) {
> -               unsigned long siboff = get_slot_offset(parent, entry);
> +               unsigned long siboff = get_slot_offset(parent,
> +                                               (void **)entry_to_node(entry));

I feel that it is *this* part that I think needs a huge honking comment.

If you are going to make get_slot_offset() different, then you could
just rewrite get_slot_offset() to do

        unsigned long diff = (unsigned long) slot - (unsigned
long)parent->slots;
        return diff / sizeof(void *);

and add a comment to say "don't do this as a pointer diff, because
'slot' may not be an aligned pointer". No BUG_ON() necessary, because
it "just works".

At that point, gcc should just generate the right code, because it
doesn't see it as a pointer subtraction followed by a pointer
addition.

And yes, that crazy " (void **)entry_to_node(entry)" fixes it *too*,
but it needs a *comment*.

Why is that special, when all the other uses of get_slot_offset()
don't have that? *That* is what should be explained. Not some internal
detail.

That said, if this code isn't even used, as Konstantin says (THP
selects it - doesn't THP use it?), then the fix really should be to
just remove the odd code instead of adding to it.

Looking around for uses that set "order" to anything but zero, I
really don't see it. So maybe we should just do *that* trivial thing
instead, and remove CONFIG_RADIX_TREE_MULTIORDER, since it's appears
to be buggy and always has been.

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-24 20:21         ` Linus Torvalds
  (?)
@ 2016-09-24 20:47         ` Linus Torvalds
  -1 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-24 20:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	linux-mm, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

On Sat, Sep 24, 2016 at 1:21 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, if this code isn't even used, as Konstantin says (THP
> selects it - doesn't THP use it?), then the fix really should be to
> just remove the odd code instead of adding to it.
>
> Looking around for uses that set "order" to anything but zero, I
> really don't see it. So maybe we should just do *that* trivial thing
> instead, and remove CONFIG_RADIX_TREE_MULTIORDER, since it's appears
> to be buggy and always has been.

IOW, a patch something like this?

NOTE! This is entirely untested. Things still seem to compile with it,
at least with some configurations. That's all I can say.

I do like this part:

   11 files changed, 29 insertions(+), 518 deletions(-)

although admittedly 2/3rds of the deletions were for the multiorder
tests. But even if you ignore the test side, it's just fairly clean
removal of code that is apparently not used, and that was buggy.

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 26375 bytes --]

 include/linux/radix-tree.h                    |  48 +---
 lib/Kconfig                                   |   3 -
 lib/radix-tree.c                              | 100 +-------
 mm/Kconfig                                    |   1 -
 mm/filemap.c                                  |   2 +-
 tools/testing/radix-tree/Makefile             |   2 +-
 tools/testing/radix-tree/generated/autoconf.h |   1 -
 tools/testing/radix-tree/main.c               |  34 +--
 tools/testing/radix-tree/multiorder.c         | 337 --------------------------
 tools/testing/radix-tree/test.c               |  14 +-
 tools/testing/radix-tree/test.h               |   5 -
 11 files changed, 29 insertions(+), 518 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 4c45105dece3..ac546baa604c 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -263,15 +263,9 @@ static inline void radix_tree_replace_slot(void **pslot, void *item)
 }
 
 int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
-			unsigned order, struct radix_tree_node **nodep,
+			struct radix_tree_node **nodep,
 			void ***slotp);
-int __radix_tree_insert(struct radix_tree_root *, unsigned long index,
-			unsigned order, void *);
-static inline int radix_tree_insert(struct radix_tree_root *root,
-			unsigned long index, void *entry)
-{
-	return __radix_tree_insert(root, index, 0, entry);
-}
+int radix_tree_insert(struct radix_tree_root *, unsigned long index, void *);
 void *__radix_tree_lookup(struct radix_tree_root *root, unsigned long index,
 			  struct radix_tree_node **nodep, void ***slotp);
 void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
@@ -338,20 +332,8 @@ struct radix_tree_iter {
 	unsigned long	index;
 	unsigned long	next_index;
 	unsigned long	tags;
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	unsigned int	shift;
-#endif
 };
 
-static inline unsigned int iter_shift(struct radix_tree_iter *iter)
-{
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	return iter->shift;
-#else
-	return 0;
-#endif
-}
-
 #define RADIX_TREE_ITER_TAG_MASK	0x00FF	/* tag index in lower byte */
 #define RADIX_TREE_ITER_TAGGED		0x0100	/* lookup tagged slots */
 #define RADIX_TREE_ITER_CONTIG		0x0200	/* stop at first hole */
@@ -415,7 +397,7 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
 static inline unsigned long
 __radix_tree_iter_add(struct radix_tree_iter *iter, unsigned long slots)
 {
-	return iter->index + (slots << iter_shift(iter));
+	return iter->index + slots;
 }
 
 /**
@@ -443,7 +425,7 @@ void **radix_tree_iter_next(struct radix_tree_iter *iter)
 static __always_inline long
 radix_tree_chunk_size(struct radix_tree_iter *iter)
 {
-	return (iter->next_index - iter->index) >> iter_shift(iter);
+	return iter->next_index - iter->index;
 }
 
 static inline struct radix_tree_node *entry_to_node(void *ptr)
@@ -466,22 +448,9 @@ static __always_inline void **
 radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags)
 {
 	if (flags & RADIX_TREE_ITER_TAGGED) {
-		void *canon = slot;
-
 		iter->tags >>= 1;
 		if (unlikely(!iter->tags))
 			return NULL;
-		while (IS_ENABLED(CONFIG_RADIX_TREE_MULTIORDER) &&
-					radix_tree_is_internal_node(slot[1])) {
-			if (entry_to_node(slot[1]) == canon) {
-				iter->tags >>= 1;
-				iter->index = __radix_tree_iter_add(iter, 1);
-				slot++;
-				continue;
-			}
-			iter->next_index = __radix_tree_iter_add(iter, 1);
-			return NULL;
-		}
 		if (likely(iter->tags & 1ul)) {
 			iter->index = __radix_tree_iter_add(iter, 1);
 			return slot + 1;
@@ -495,20 +464,11 @@ radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags)
 		}
 	} else {
 		long count = radix_tree_chunk_size(iter);
-		void *canon = slot;
 
 		while (--count > 0) {
 			slot++;
 			iter->index = __radix_tree_iter_add(iter, 1);
 
-			if (IS_ENABLED(CONFIG_RADIX_TREE_MULTIORDER) &&
-			    radix_tree_is_internal_node(*slot)) {
-				if (entry_to_node(*slot) == canon)
-					continue;
-				iter->next_index = iter->index;
-				break;
-			}
-
 			if (likely(*slot))
 				return slot;
 			if (flags & RADIX_TREE_ITER_CONTIG) {
diff --git a/lib/Kconfig b/lib/Kconfig
index d79909dc01ec..61d55bd0ed89 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -362,9 +362,6 @@ config INTERVAL_TREE
 
 	  for more information.
 
-config RADIX_TREE_MULTIORDER
-	bool
-
 config ASSOCIATIVE_ARRAY
 	bool
 	help
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf7314141..b8b12172bb60 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -76,21 +76,6 @@ static inline void *node_to_entry(void *ptr)
 
 #define RADIX_TREE_RETRY	node_to_entry(NULL)
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-/* Sibling slots point directly to another slot in the same node */
-static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
-{
-	void **ptr = node;
-	return (parent->slots <= ptr) &&
-			(ptr < parent->slots + RADIX_TREE_MAP_SIZE);
-}
-#else
-static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
-{
-	return false;
-}
-#endif
-
 static inline unsigned long get_slot_offset(struct radix_tree_node *parent,
 						 void **slot)
 {
@@ -103,16 +88,6 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 	unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
 	void **entry = rcu_dereference_raw(parent->slots[offset]);
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
-		if (siboff < RADIX_TREE_MAP_SIZE) {
-			offset = siboff;
-			entry = rcu_dereference_raw(parent->slots[offset]);
-		}
-	}
-#endif
-
 	*nodep = (void *)entry;
 	return offset;
 }
@@ -231,12 +206,7 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
 		void *entry = node->slots[i];
 		if (!entry)
 			continue;
-		if (is_sibling_entry(node, entry)) {
-			pr_debug("radix sblng %p offset %ld val %p indices %ld-%ld\n",
-					entry, i,
-					*(void **)entry_to_node(entry),
-					first, last);
-		} else if (!radix_tree_is_internal_node(entry)) {
+		if (!radix_tree_is_internal_node(entry)) {
 			pr_debug("radix entry %p offset %ld indices %ld-%ld\n",
 					entry, i, first, last);
 		} else {
@@ -551,29 +521,28 @@ out:
  *	Returns -ENOMEM, or 0 for success.
  */
 int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
-			unsigned order, struct radix_tree_node **nodep,
+			struct radix_tree_node **nodep,
 			void ***slotp)
 {
 	struct radix_tree_node *node = NULL, *child;
 	void **slot = (void **)&root->rnode;
 	unsigned long maxindex;
 	unsigned int shift, offset = 0;
-	unsigned long max = index | ((1UL << order) - 1);
 
 	shift = radix_tree_load_root(root, &child, &maxindex);
 
 	/* Make sure the tree is high enough.  */
-	if (max > maxindex) {
-		int error = radix_tree_extend(root, max, shift);
+	if (index > maxindex) {
+		int error = radix_tree_extend(root, index, shift);
 		if (error < 0)
 			return error;
 		shift = error;
 		child = root->rnode;
-		if (order == shift)
+		if (!shift)
 			shift += RADIX_TREE_MAP_SHIFT;
 	}
 
-	while (shift > order) {
+	while (shift > 0) {
 		shift -= RADIX_TREE_MAP_SHIFT;
 		if (child == NULL) {
 			/* Have to add a child node.  */
@@ -595,25 +564,6 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 		slot = &node->slots[offset];
 	}
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	/* Insert pointers to the canonical entry */
-	if (order > shift) {
-		unsigned i, n = 1 << (order - shift);
-		offset = offset & ~(n - 1);
-		slot = &node->slots[offset];
-		child = node_to_entry(slot);
-		for (i = 0; i < n; i++) {
-			if (slot[i])
-				return -EEXIST;
-		}
-
-		for (i = 1; i < n; i++) {
-			rcu_assign_pointer(slot[i], child);
-			node->count++;
-		}
-	}
-#endif
-
 	if (nodep)
 		*nodep = node;
 	if (slotp)
@@ -630,8 +580,7 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
  *
  *	Insert an item into the radix tree at position @index.
  */
-int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
-			unsigned order, void *item)
+int radix_tree_insert(struct radix_tree_root *root, unsigned long index, void *item)
 {
 	struct radix_tree_node *node;
 	void **slot;
@@ -639,7 +588,7 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
 
 	BUG_ON(radix_tree_is_internal_node(item));
 
-	error = __radix_tree_create(root, index, order, &node, &slot);
+	error = __radix_tree_create(root, index, &node, &slot);
 	if (error)
 		return error;
 	if (*slot != NULL)
@@ -658,7 +607,7 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
 
 	return 0;
 }
-EXPORT_SYMBOL(__radix_tree_insert);
+EXPORT_SYMBOL(radix_tree_insert);
 
 /**
  *	__radix_tree_lookup	-	lookup an item in a radix tree
@@ -894,14 +843,6 @@ int radix_tree_tag_get(struct radix_tree_root *root,
 }
 EXPORT_SYMBOL(radix_tree_tag_get);
 
-static inline void __set_iter_shift(struct radix_tree_iter *iter,
-					unsigned int shift)
-{
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	iter->shift = shift;
-#endif
-}
-
 /**
  * radix_tree_next_chunk - find next chunk of slots for iteration
  *
@@ -945,7 +886,6 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 		iter->index = index;
 		iter->next_index = maxindex + 1;
 		iter->tags = 1;
-		__set_iter_shift(iter, 0);
 		return (void **)&root->rnode;
 	}
 
@@ -967,8 +907,6 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 			else
 				while (++offset	< RADIX_TREE_MAP_SIZE) {
 					void *slot = node->slots[offset];
-					if (is_sibling_entry(node, slot))
-						continue;
 					if (slot)
 						break;
 				}
@@ -989,7 +927,6 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 	/* Update the iterator state */
 	iter->index = (index &~ node_maxindex(node)) | (offset << node->shift);
 	iter->next_index = (index | node_maxindex(node)) + 1;
-	__set_iter_shift(iter, node->shift);
 
 	/* Construct iter->tags bit-mask from node->tags[tag] array */
 	if (flags & RADIX_TREE_ITER_TAGGED) {
@@ -1112,8 +1049,6 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 			node = node->parent;
 			offset = (index >> node->shift) & RADIX_TREE_MAP_MASK;
 		}
-		if (is_sibling_entry(node, node->slots[offset]))
-			goto next;
 		if (tagged >= nr_to_tag)
 			break;
 	}
@@ -1329,8 +1264,6 @@ static unsigned long __locate(struct radix_tree_node *slot, void *item,
 				continue;
 			}
 			node = entry_to_node(node);
-			if (is_sibling_entry(slot, node))
-				continue;
 			slot = node;
 			break;
 		}
@@ -1505,20 +1438,6 @@ bool __radix_tree_delete_node(struct radix_tree_root *root,
 	return deleted;
 }
 
-static inline void delete_sibling_entries(struct radix_tree_node *node,
-					void *ptr, unsigned offset)
-{
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	int i;
-	for (i = 1; offset + i < RADIX_TREE_MAP_SIZE; i++) {
-		if (node->slots[offset + i] != ptr)
-			break;
-		node->slots[offset + i] = NULL;
-		node->count--;
-	}
-#endif
-}
-
 /**
  *	radix_tree_delete_item    -    delete an item from a radix tree
  *	@root:		radix tree root
@@ -1558,7 +1477,6 @@ void *radix_tree_delete_item(struct radix_tree_root *root,
 	for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 		node_tag_clear(root, node, tag, offset);
 
-	delete_sibling_entries(node, node_to_entry(slot), offset);
 	node->slots[offset] = NULL;
 	node->count--;
 
diff --git a/mm/Kconfig b/mm/Kconfig
index be0ee11fa0d9..57fad7262c42 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -412,7 +412,6 @@ config TRANSPARENT_HUGEPAGE
 	bool "Transparent Hugepage Support"
 	depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select COMPACTION
-	select RADIX_TREE_MULTIORDER
 	help
 	  Transparent Hugepages allows the kernel to use huge pages and
 	  huge tlb transparently to the applications whenever possible.
diff --git a/mm/filemap.c b/mm/filemap.c
index 8a287dfc5372..3046d7583267 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -591,7 +591,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
 	void **slot;
 	int error;
 
-	error = __radix_tree_create(&mapping->page_tree, page->index, 0,
+	error = __radix_tree_create(&mapping->page_tree, page->index,
 				    &node, &slot);
 	if (error)
 		return error;
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 3b530467148e..43febba864bd 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -3,7 +3,7 @@ CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
 LDFLAGS += -lpthread -lurcu
 TARGETS = main
 OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \
-	 regression1.o regression2.o regression3.o multiorder.o
+	 regression1.o regression2.o regression3.o
 
 targets: $(TARGETS)
 
diff --git a/tools/testing/radix-tree/generated/autoconf.h b/tools/testing/radix-tree/generated/autoconf.h
index ad18cf5a2a3a..a6b5fba4c3e0 100644
--- a/tools/testing/radix-tree/generated/autoconf.h
+++ b/tools/testing/radix-tree/generated/autoconf.h
@@ -1,3 +1,2 @@
-#define CONFIG_RADIX_TREE_MULTIORDER 1
 #define CONFIG_SHMEM 1
 #define CONFIG_SWAP 1
diff --git a/tools/testing/radix-tree/main.c b/tools/testing/radix-tree/main.c
index b7619ff3b552..bdee099410f0 100644
--- a/tools/testing/radix-tree/main.c
+++ b/tools/testing/radix-tree/main.c
@@ -232,18 +232,17 @@ void copy_tag_check(void)
 	item_kill_tree(&tree);
 }
 
-static void __locate_check(struct radix_tree_root *tree, unsigned long index,
-			unsigned order)
+static void __locate_check(struct radix_tree_root *tree, unsigned long index)
 {
 	struct item *item;
 	unsigned long index2;
 
-	item_insert_order(tree, index, order);
+	item_insert(tree, index);
 	item = item_lookup(tree, index);
 	index2 = radix_tree_locate_item(tree, item);
 	if (index != index2) {
-		printf("index %ld order %d inserted; found %ld\n",
-			index, order, index2);
+		printf("index %ld inserted; found %ld\n",
+			index, index2);
 		abort();
 	}
 }
@@ -254,7 +253,7 @@ static void __order_0_locate_check(void)
 	int i;
 
 	for (i = 0; i < 50; i++)
-		__locate_check(&tree, rand() % INT_MAX, 0);
+		__locate_check(&tree, rand() % INT_MAX);
 
 	item_kill_tree(&tree);
 }
@@ -262,28 +261,23 @@ static void __order_0_locate_check(void)
 static void locate_check(void)
 {
 	RADIX_TREE(tree, GFP_KERNEL);
-	unsigned order;
 	unsigned long offset, index;
 
 	__order_0_locate_check();
 
-	for (order = 0; order < 20; order++) {
-		for (offset = 0; offset < (1 << (order + 3));
-		     offset += (1UL << order)) {
-			for (index = 0; index < (1UL << (order + 5));
-			     index += (1UL << order)) {
-				__locate_check(&tree, index + offset, order);
-			}
-			if (radix_tree_locate_item(&tree, &tree) != -1)
-				abort();
-
-			item_kill_tree(&tree);
+	for (offset = 0; offset < (1 << 3); offset++) {
+		for (index = 0; index < (1UL << 5); index++) {
+			__locate_check(&tree, index + offset);
 		}
+		if (radix_tree_locate_item(&tree, &tree) != -1)
+			abort();
+
+		item_kill_tree(&tree);
 	}
 
 	if (radix_tree_locate_item(&tree, &tree) != -1)
 		abort();
-	__locate_check(&tree, -1, 0);
+	__locate_check(&tree, -1);
 	if (radix_tree_locate_item(&tree, &tree) != -1)
 		abort();
 	item_kill_tree(&tree);
@@ -294,8 +288,6 @@ static void single_thread_tests(bool long_run)
 	int i;
 
 	printf("starting single_thread_tests: %d allocated\n", nr_allocated);
-	multiorder_checks();
-	printf("after multiorder_check: %d allocated\n", nr_allocated);
 	locate_check();
 	printf("after locate_check: %d allocated\n", nr_allocated);
 	tag_check();
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
deleted file mode 100644
index 39d9b9568fe2..000000000000
--- a/tools/testing/radix-tree/multiorder.c
+++ /dev/null
@@ -1,337 +0,0 @@
-/*
- * multiorder.c: Multi-order radix tree entry testing
- * Copyright (c) 2016 Intel Corporation
- * Author: Ross Zwisler <ross.zwisler@linux.intel.com>
- * Author: Matthew Wilcox <matthew.r.wilcox@intel.com>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- */
-#include <linux/radix-tree.h>
-#include <linux/slab.h>
-#include <linux/errno.h>
-
-#include "test.h"
-
-#define for_each_index(i, base, order) \
-	for (i = base; i < base + (1 << order); i++)
-
-static void __multiorder_tag_test(int index, int order)
-{
-	RADIX_TREE(tree, GFP_KERNEL);
-	int base, err, i;
-	unsigned long first = 0;
-
-	/* our canonical entry */
-	base = index & ~((1 << order) - 1);
-
-	printf("Multiorder tag test with index %d, canonical entry %d\n",
-			index, base);
-
-	err = item_insert_order(&tree, index, order);
-	assert(!err);
-
-	/*
-	 * Verify we get collisions for covered indices.  We try and fail to
-	 * insert an exceptional entry so we don't leak memory via
-	 * item_insert_order().
-	 */
-	for_each_index(i, base, order) {
-		err = __radix_tree_insert(&tree, i, order,
-				(void *)(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY));
-		assert(err == -EEXIST);
-	}
-
-	for_each_index(i, base, order) {
-		assert(!radix_tree_tag_get(&tree, i, 0));
-		assert(!radix_tree_tag_get(&tree, i, 1));
-	}
-
-	assert(radix_tree_tag_set(&tree, index, 0));
-
-	for_each_index(i, base, order) {
-		assert(radix_tree_tag_get(&tree, i, 0));
-		assert(!radix_tree_tag_get(&tree, i, 1));
-	}
-
-	assert(radix_tree_range_tag_if_tagged(&tree, &first, ~0UL, 10, 0, 1) == 1);
-	assert(radix_tree_tag_clear(&tree, index, 0));
-
-	for_each_index(i, base, order) {
-		assert(!radix_tree_tag_get(&tree, i, 0));
-		assert(radix_tree_tag_get(&tree, i, 1));
-	}
-
-	assert(radix_tree_tag_clear(&tree, index, 1));
-
-	assert(!radix_tree_tagged(&tree, 0));
-	assert(!radix_tree_tagged(&tree, 1));
-
-	item_kill_tree(&tree);
-}
-
-static void multiorder_tag_tests(void)
-{
-	/* test multi-order entry for indices 0-7 with no sibling pointers */
-	__multiorder_tag_test(0, 3);
-	__multiorder_tag_test(5, 3);
-
-	/* test multi-order entry for indices 8-15 with no sibling pointers */
-	__multiorder_tag_test(8, 3);
-	__multiorder_tag_test(15, 3);
-
-	/*
-	 * Our order 5 entry covers indices 0-31 in a tree with height=2.
-	 * This is broken up as follows:
-	 * 0-7:		canonical entry
-	 * 8-15:	sibling 1
-	 * 16-23:	sibling 2
-	 * 24-31:	sibling 3
-	 */
-	__multiorder_tag_test(0, 5);
-	__multiorder_tag_test(29, 5);
-
-	/* same test, but with indices 32-63 */
-	__multiorder_tag_test(32, 5);
-	__multiorder_tag_test(44, 5);
-
-	/*
-	 * Our order 8 entry covers indices 0-255 in a tree with height=3.
-	 * This is broken up as follows:
-	 * 0-63:	canonical entry
-	 * 64-127:	sibling 1
-	 * 128-191:	sibling 2
-	 * 192-255:	sibling 3
-	 */
-	__multiorder_tag_test(0, 8);
-	__multiorder_tag_test(190, 8);
-
-	/* same test, but with indices 256-511 */
-	__multiorder_tag_test(256, 8);
-	__multiorder_tag_test(300, 8);
-
-	__multiorder_tag_test(0x12345678UL, 8);
-}
-
-static void multiorder_check(unsigned long index, int order)
-{
-	unsigned long i;
-	unsigned long min = index & ~((1UL << order) - 1);
-	unsigned long max = min + (1UL << order);
-	RADIX_TREE(tree, GFP_KERNEL);
-
-	printf("Multiorder index %ld, order %d\n", index, order);
-
-	assert(item_insert_order(&tree, index, order) == 0);
-
-	for (i = min; i < max; i++) {
-		struct item *item = item_lookup(&tree, i);
-		assert(item != 0);
-		assert(item->index == index);
-	}
-	for (i = 0; i < min; i++)
-		item_check_absent(&tree, i);
-	for (i = max; i < 2*max; i++)
-		item_check_absent(&tree, i);
-	for (i = min; i < max; i++) {
-		static void *entry = (void *)
-					(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY);
-		assert(radix_tree_insert(&tree, i, entry) == -EEXIST);
-	}
-
-	assert(item_delete(&tree, index) != 0);
-
-	for (i = 0; i < 2*max; i++)
-		item_check_absent(&tree, i);
-}
-
-static void multiorder_shrink(unsigned long index, int order)
-{
-	unsigned long i;
-	unsigned long max = 1 << order;
-	RADIX_TREE(tree, GFP_KERNEL);
-	struct radix_tree_node *node;
-
-	printf("Multiorder shrink index %ld, order %d\n", index, order);
-
-	assert(item_insert_order(&tree, 0, order) == 0);
-
-	node = tree.rnode;
-
-	assert(item_insert(&tree, index) == 0);
-	assert(node != tree.rnode);
-
-	assert(item_delete(&tree, index) != 0);
-	assert(node == tree.rnode);
-
-	for (i = 0; i < max; i++) {
-		struct item *item = item_lookup(&tree, i);
-		assert(item != 0);
-		assert(item->index == 0);
-	}
-	for (i = max; i < 2*max; i++)
-		item_check_absent(&tree, i);
-
-	if (!item_delete(&tree, 0)) {
-		printf("failed to delete index %ld (order %d)\n", index, order);		abort();
-	}
-
-	for (i = 0; i < 2*max; i++)
-		item_check_absent(&tree, i);
-}
-
-static void multiorder_insert_bug(void)
-{
-	RADIX_TREE(tree, GFP_KERNEL);
-
-	item_insert(&tree, 0);
-	radix_tree_tag_set(&tree, 0, 0);
-	item_insert_order(&tree, 3 << 6, 6);
-
-	item_kill_tree(&tree);
-}
-
-void multiorder_iteration(void)
-{
-	RADIX_TREE(tree, GFP_KERNEL);
-	struct radix_tree_iter iter;
-	void **slot;
-	int i, j, err;
-
-	printf("Multiorder iteration test\n");
-
-#define NUM_ENTRIES 11
-	int index[NUM_ENTRIES] = {0, 2, 4, 8, 16, 32, 34, 36, 64, 72, 128};
-	int order[NUM_ENTRIES] = {1, 1, 2, 3,  4,  1,  0,  1,  3,  0, 7};
-
-	for (i = 0; i < NUM_ENTRIES; i++) {
-		err = item_insert_order(&tree, index[i], order[i]);
-		assert(!err);
-	}
-
-	for (j = 0; j < 256; j++) {
-		for (i = 0; i < NUM_ENTRIES; i++)
-			if (j <= (index[i] | ((1 << order[i]) - 1)))
-				break;
-
-		radix_tree_for_each_slot(slot, &tree, &iter, j) {
-			int height = order[i] / RADIX_TREE_MAP_SHIFT;
-			int shift = height * RADIX_TREE_MAP_SHIFT;
-			int mask = (1 << order[i]) - 1;
-
-			assert(iter.index >= (index[i] &~ mask));
-			assert(iter.index <= (index[i] | mask));
-			assert(iter.shift == shift);
-			i++;
-		}
-	}
-
-	item_kill_tree(&tree);
-}
-
-void multiorder_tagged_iteration(void)
-{
-	RADIX_TREE(tree, GFP_KERNEL);
-	struct radix_tree_iter iter;
-	void **slot;
-	unsigned long first = 0;
-	int i, j;
-
-	printf("Multiorder tagged iteration test\n");
-
-#define MT_NUM_ENTRIES 9
-	int index[MT_NUM_ENTRIES] = {0, 2, 4, 16, 32, 40, 64, 72, 128};
-	int order[MT_NUM_ENTRIES] = {1, 0, 2, 4,  3,  1,  3,  0,   7};
-
-#define TAG_ENTRIES 7
-	int tag_index[TAG_ENTRIES] = {0, 4, 16, 40, 64, 72, 128};
-
-	for (i = 0; i < MT_NUM_ENTRIES; i++)
-		assert(!item_insert_order(&tree, index[i], order[i]));
-
-	assert(!radix_tree_tagged(&tree, 1));
-
-	for (i = 0; i < TAG_ENTRIES; i++)
-		assert(radix_tree_tag_set(&tree, tag_index[i], 1));
-
-	for (j = 0; j < 256; j++) {
-		int mask, k;
-
-		for (i = 0; i < TAG_ENTRIES; i++) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			if (j <= (index[k] | ((1 << order[k]) - 1)))
-				break;
-		}
-
-		radix_tree_for_each_tagged(slot, &tree, &iter, j, 1) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			mask = (1 << order[k]) - 1;
-
-			assert(iter.index >= (tag_index[i] &~ mask));
-			assert(iter.index <= (tag_index[i] | mask));
-			i++;
-		}
-	}
-
-	radix_tree_range_tag_if_tagged(&tree, &first, ~0UL,
-					MT_NUM_ENTRIES, 1, 2);
-
-	for (j = 0; j < 256; j++) {
-		int mask, k;
-
-		for (i = 0; i < TAG_ENTRIES; i++) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			if (j <= (index[k] | ((1 << order[k]) - 1)))
-				break;
-		}
-
-		radix_tree_for_each_tagged(slot, &tree, &iter, j, 2) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			mask = (1 << order[k]) - 1;
-
-			assert(iter.index >= (tag_index[i] &~ mask));
-			assert(iter.index <= (tag_index[i] | mask));
-			i++;
-		}
-	}
-
-	first = 1;
-	radix_tree_range_tag_if_tagged(&tree, &first, ~0UL,
-					MT_NUM_ENTRIES, 1, 0);
-	i = 0;
-	radix_tree_for_each_tagged(slot, &tree, &iter, 0, 0) {
-		assert(iter.index == tag_index[i]);
-		i++;
-	}
-
-	item_kill_tree(&tree);
-}
-
-void multiorder_checks(void)
-{
-	int i;
-
-	for (i = 0; i < 20; i++) {
-		multiorder_check(200, i);
-		multiorder_check(0, i);
-		multiorder_check((1UL << i) + 1, i);
-	}
-
-	for (i = 0; i < 15; i++)
-		multiorder_shrink((1UL << (i + RADIX_TREE_MAP_SHIFT)), i);
-
-	multiorder_insert_bug();
-	multiorder_tag_tests();
-	multiorder_iteration();
-	multiorder_tagged_iteration();
-}
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index a6e8099eaf4f..81377b40af79 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -24,21 +24,9 @@ int item_tag_get(struct radix_tree_root *root, unsigned long index, int tag)
 	return radix_tree_tag_get(root, index, tag);
 }
 
-int __item_insert(struct radix_tree_root *root, struct item *item,
-			unsigned order)
-{
-	return __radix_tree_insert(root, item->index, order, item);
-}
-
 int item_insert(struct radix_tree_root *root, unsigned long index)
 {
-	return __item_insert(root, item_create(index), 0);
-}
-
-int item_insert_order(struct radix_tree_root *root, unsigned long index,
-			unsigned order)
-{
-	return __item_insert(root, item_create(index), order);
+	return radix_tree_insert(root, index, item_create(index));
 }
 
 int item_delete(struct radix_tree_root *root, unsigned long index)
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index e85131369723..7c2e290e4c3e 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -8,11 +8,7 @@ struct item {
 };
 
 struct item *item_create(unsigned long index);
-int __item_insert(struct radix_tree_root *root, struct item *item,
-			unsigned order);
 int item_insert(struct radix_tree_root *root, unsigned long index);
-int item_insert_order(struct radix_tree_root *root, unsigned long index,
-			unsigned order);
 int item_delete(struct radix_tree_root *root, unsigned long index);
 struct item *item_lookup(struct radix_tree_root *root, unsigned long index);
 
@@ -26,7 +22,6 @@ void item_full_scan(struct radix_tree_root *root, unsigned long start,
 void item_kill_tree(struct radix_tree_root *root);
 
 void tag_check(void);
-void multiorder_checks(void);
 
 struct item *
 item_tag_set(struct radix_tree_root *root, unsigned long index, int tag);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-24 20:21         ` Linus Torvalds
@ 2016-09-24 21:04           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2016-09-24 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Matthew Wilcox, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	linux-mm, linux-fsdevel

On Sat, Sep 24, 2016 at 01:21:36PM -0700, Linus Torvalds wrote:
> On Fri, Sep 23, 2016 at 1:16 PM, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> >
> >  #ifdef CONFIG_RADIX_TREE_MULTIORDER
> >         if (radix_tree_is_internal_node(entry)) {
> > -               unsigned long siboff = get_slot_offset(parent, entry);
> > +               unsigned long siboff = get_slot_offset(parent,
> > +                                               (void **)entry_to_node(entry));
> 
> I feel that it is *this* part that I think needs a huge honking comment.
> 
> If you are going to make get_slot_offset() different, then you could
> just rewrite get_slot_offset() to do
> 
>         unsigned long diff = (unsigned long) slot - (unsigned
> long)parent->slots;
>         return diff / sizeof(void *);
> 
> and add a comment to say "don't do this as a pointer diff, because
> 'slot' may not be an aligned pointer". No BUG_ON() necessary, because
> it "just works".
> 
> At that point, gcc should just generate the right code, because it
> doesn't see it as a pointer subtraction followed by a pointer
> addition.
> 
> And yes, that crazy " (void **)entry_to_node(entry)" fixes it *too*,
> but it needs a *comment*.
> 
> Why is that special, when all the other uses of get_slot_offset()
> don't have that? *That* is what should be explained. Not some internal
> detail.
> 
> That said, if this code isn't even used, as Konstantin says (THP
> selects it - doesn't THP use it?), then the fix really should be to
> just remove the odd code instead of adding to it.
> 
> Looking around for uses that set "order" to anything but zero, I
> really don't see it. So maybe we should just do *that* trivial thing
> instead, and remove CONFIG_RADIX_TREE_MULTIORDER, since it's appears
> to be buggy and always has been.

Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
It also converts shmem-with-huge-pages and hugetlb to them.

I'm okay with converting it to other mechanism, but I need something.
(I looked into Konstantin's RFC patchset[2]. It looks okay, but I don't
feel myself qualified to review it as I don't know much about radix-tree
internals.)

[1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
[2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-24 21:04           ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2016-09-24 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Matthew Wilcox, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	linux-mm, linux-fsdevel

On Sat, Sep 24, 2016 at 01:21:36PM -0700, Linus Torvalds wrote:
> On Fri, Sep 23, 2016 at 1:16 PM, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> >
> >  #ifdef CONFIG_RADIX_TREE_MULTIORDER
> >         if (radix_tree_is_internal_node(entry)) {
> > -               unsigned long siboff = get_slot_offset(parent, entry);
> > +               unsigned long siboff = get_slot_offset(parent,
> > +                                               (void **)entry_to_node(entry));
> 
> I feel that it is *this* part that I think needs a huge honking comment.
> 
> If you are going to make get_slot_offset() different, then you could
> just rewrite get_slot_offset() to do
> 
>         unsigned long diff = (unsigned long) slot - (unsigned
> long)parent->slots;
>         return diff / sizeof(void *);
> 
> and add a comment to say "don't do this as a pointer diff, because
> 'slot' may not be an aligned pointer". No BUG_ON() necessary, because
> it "just works".
> 
> At that point, gcc should just generate the right code, because it
> doesn't see it as a pointer subtraction followed by a pointer
> addition.
> 
> And yes, that crazy " (void **)entry_to_node(entry)" fixes it *too*,
> but it needs a *comment*.
> 
> Why is that special, when all the other uses of get_slot_offset()
> don't have that? *That* is what should be explained. Not some internal
> detail.
> 
> That said, if this code isn't even used, as Konstantin says (THP
> selects it - doesn't THP use it?), then the fix really should be to
> just remove the odd code instead of adding to it.
> 
> Looking around for uses that set "order" to anything but zero, I
> really don't see it. So maybe we should just do *that* trivial thing
> instead, and remove CONFIG_RADIX_TREE_MULTIORDER, since it's appears
> to be buggy and always has been.

Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
It also converts shmem-with-huge-pages and hugetlb to them.

I'm okay with converting it to other mechanism, but I need something.
(I looked into Konstantin's RFC patchset[2]. It looks okay, but I don't
feel myself qualified to review it as I don't know much about radix-tree
internals.)

[1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
[2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-24 21:04           ` Kirill A. Shutemov
@ 2016-09-24 22:54             ` Linus Torvalds
  -1 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-24 22:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Matthew Wilcox, Matthew Wilcox, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	linux-mm, linux-fsdevel

On Sat, Sep 24, 2016 at 2:04 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
> It also converts shmem-with-huge-pages and hugetlb to them.

Ok, so that code actually has a chance of being used. I guess we'll
not remove it. But I *would* like this subtle issue to have a comment
around that odd cast/and/mask thing.

            Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-24 22:54             ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-24 22:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Matthew Wilcox, Matthew Wilcox, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	linux-mm, linux-fsdevel

On Sat, Sep 24, 2016 at 2:04 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
> It also converts shmem-with-huge-pages and hugetlb to them.

Ok, so that code actually has a chance of being used. I guess we'll
not remove it. But I *would* like this subtle issue to have a comment
around that odd cast/and/mask thing.

            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-22 18:53   ` Matthew Wilcox
@ 2016-09-24 23:35     ` Cedric Blancher
  -1 siblings, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-24 23:35 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	Linux MM, linux-fsdevel, Matthew Wilcox

On 22 September 2016 at 20:53, Matthew Wilcox
<mawilcox@linuxonhyperv.com> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> When compiling the radix tree with -O2, GCC thinks it can optimise:
>
>         void *entry = parent->slots[offset];
>         int siboff = entry - parent->slots;

If entry is a pointer to void, how can you do pointer arithmetic with it?
Also, if you use pointer distances, the use of int is not valid, it
should then be ptrdiff_t siboff.

lint(1) would bite your arse off in both cases.
Sadly only UNIX (Solaris, AIX, ...) use lint(1) as mandatory part of
the build process and make warnings and errors of lint(1) fatal...

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-24 23:35     ` Cedric Blancher
  0 siblings, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-24 23:35 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	Linux MM, linux-fsdevel, Matthew Wilcox

On 22 September 2016 at 20:53, Matthew Wilcox
<mawilcox@linuxonhyperv.com> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> When compiling the radix tree with -O2, GCC thinks it can optimise:
>
>         void *entry = parent->slots[offset];
>         int siboff = entry - parent->slots;

If entry is a pointer to void, how can you do pointer arithmetic with it?
Also, if you use pointer distances, the use of int is not valid, it
should then be ptrdiff_t siboff.

lint(1) would bite your arse off in both cases.
Sadly only UNIX (Solaris, AIX, ...) use lint(1) as mandatory part of
the build process and make warnings and errors of lint(1) fatal...

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-24 23:35     ` Cedric Blancher
@ 2016-09-25  0:18       ` Linus Torvalds
  -1 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-25  0:18 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	Linux MM, linux-fsdevel, Matthew Wilcox

On Sat, Sep 24, 2016 at 4:35 PM, Cedric Blancher
<cedric.blancher@gmail.com> wrote:
>>
>>         void *entry = parent->slots[offset];
>>         int siboff = entry - parent->slots;
>
> If entry is a pointer to void, how can you do pointer arithmetic with it?

It's actually void **.

(That said, gcc has an extension that considers "void *" to be a byte
pointer, so you can actually do arithmetic on them, and it acts like
"char *")

> Also, if you use pointer distances, the use of int is not valid, it
> should then be ptrdiff_t siboff.

The use of "int" is perfectly valid, since it's limited by
RADIX_TREE_MAP_SIZE, so it's going to be a small integer.

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-25  0:18       ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-25  0:18 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	Linux MM, linux-fsdevel, Matthew Wilcox

On Sat, Sep 24, 2016 at 4:35 PM, Cedric Blancher
<cedric.blancher@gmail.com> wrote:
>>
>>         void *entry = parent->slots[offset];
>>         int siboff = entry - parent->slots;
>
> If entry is a pointer to void, how can you do pointer arithmetic with it?

It's actually void **.

(That said, gcc has an extension that considers "void *" to be a byte
pointer, so you can actually do arithmetic on them, and it acts like
"char *")

> Also, if you use pointer distances, the use of int is not valid, it
> should then be ptrdiff_t siboff.

The use of "int" is perfectly valid, since it's limited by
RADIX_TREE_MAP_SIZE, so it's going to be a small integer.

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-25  0:18       ` Linus Torvalds
@ 2016-09-25 17:59         ` Cedric Blancher
  -1 siblings, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-25 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	Linux MM, linux-fsdevel, Matthew Wilcox

On 25 September 2016 at 02:18, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Sep 24, 2016 at 4:35 PM, Cedric Blancher
> <cedric.blancher@gmail.com> wrote:
>>>
>>>         void *entry = parent->slots[offset];
>>>         int siboff = entry - parent->slots;
>>
>> If entry is a pointer to void, how can you do pointer arithmetic with it?
>
> It's actually void **.
>
> (That said, gcc has an extension that considers "void *" to be a byte
> pointer, so you can actually do arithmetic on them, and it acts like
> "char *")
>
>> Also, if you use pointer distances, the use of int is not valid, it
>> should then be ptrdiff_t siboff.
>
> The use of "int" is perfectly valid, since it's limited by
> RADIX_TREE_MAP_SIZE, so it's going to be a small integer.

A specific data type would be wise (aka radtr_mapsz_t) to prevent a
disaster as SystemV had early during development. It took AT&T TWO
fucking months to figure out that their avl tree implementation had a
small type problem with int vs long.
Since I'd expect no one cares I'm going to print this email so I can
send the scan as PDF each time you hit that problem in the future with
"told you so"


Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-25 17:59         ` Cedric Blancher
  0 siblings, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-25 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Konstantin Khlebnikov, Ross Zwisler, Linux Kernel Mailing List,
	Linux MM, linux-fsdevel, Matthew Wilcox

On 25 September 2016 at 02:18, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Sep 24, 2016 at 4:35 PM, Cedric Blancher
> <cedric.blancher@gmail.com> wrote:
>>>
>>>         void *entry = parent->slots[offset];
>>>         int siboff = entry - parent->slots;
>>
>> If entry is a pointer to void, how can you do pointer arithmetic with it?
>
> It's actually void **.
>
> (That said, gcc has an extension that considers "void *" to be a byte
> pointer, so you can actually do arithmetic on them, and it acts like
> "char *")
>
>> Also, if you use pointer distances, the use of int is not valid, it
>> should then be ptrdiff_t siboff.
>
> The use of "int" is perfectly valid, since it's limited by
> RADIX_TREE_MAP_SIZE, so it's going to be a small integer.

A specific data type would be wise (aka radtr_mapsz_t) to prevent a
disaster as SystemV had early during development. It took AT&T TWO
fucking months to figure out that their avl tree implementation had a
small type problem with int vs long.
Since I'd expect no one cares I'm going to print this email so I can
send the scan as PDF each time you hit that problem in the future with
"told you so"


Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-25 17:59         ` Cedric Blancher
  (?)
@ 2016-09-25 18:04         ` Linus Torvalds
  2016-09-25 19:04           ` Linus Torvalds
  -1 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2016-09-25 18:04 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Ross Zwisler, Matthew Wilcox, Konstantin Khlebnikov,
	Andrew Morton, Kirill A. Shutemov, linux-fsdevel, Linux MM,
	Linux Kernel Mailing List, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 772 bytes --]

On Sep 25, 2016 10:59 AM, "Cedric Blancher" <cedric.blancher@gmail.com>
wrote:
> >
> > The use of "int" is perfectly valid, since it's limited by
> > RADIX_TREE_MAP_SIZE, so it's going to be a small integer.
>
> A specific data type would be wise (aka radtr_mapsz_t) to prevent a
> disaster as SystemV had early during development.

Actually, you're right that the code is shit and shouldn't use an "int"
there.

The value range is indeed just up to RADIX_TREE_MAP_SIZE, but since the
code actually can get entries that are *not* sibling entries, it could
overflow

The more I look at that particular piece of code, the less I like it. It's
buggy shit. It needs to be rewritten entirely too actually check for
sibling entries, not that ad-hoc arithmetic crap.

     Linus

[-- Attachment #2: Type: text/html, Size: 1043 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-25 18:04         ` Linus Torvalds
@ 2016-09-25 19:04           ` Linus Torvalds
  2016-09-25 19:40               ` Cedric Blancher
  2016-09-25 19:56             ` Linus Torvalds
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2016-09-25 19:04 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Ross Zwisler, Matthew Wilcox, Konstantin Khlebnikov,
	Andrew Morton, Kirill A. Shutemov, linux-fsdevel, Linux MM,
	Linux Kernel Mailing List, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Sun, Sep 25, 2016 at 11:04 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The more I look at that particular piece of code, the less I like it. It's
> buggy shit. It needs to be rewritten entirely too actually check for sibling
> entries, not that ad-hoc arithmetic crap.

Here's my attempt at cleaning the mess up.

I'm not claiming it's perfect, but I think it's better. It gets rid of
the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
be inside the is_sibling_entry() logic instead. Which got renamed and
made to actually return the main sibling. So now there is at least
only *one* piece of code that does that range comparison, and I don't
think there is any huge need to explain what's going on, because the
"magic" is unconditional.

Willy?

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1766 bytes --]

 lib/radix-tree.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf7314141..210709b07759 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -78,18 +78,20 @@ static inline void *node_to_entry(void *ptr)
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 /* Sibling slots point directly to another slot in the same node */
-static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
+static inline void **get_sibling_entry(struct radix_tree_node *parent, void *node)
 {
-	void **ptr = node;
-	return (parent->slots <= ptr) &&
-			(ptr < parent->slots + RADIX_TREE_MAP_SIZE);
+	void **ptr = (void **) entry_to_node(node);
+	if ((parent->slots <= ptr) && (ptr < parent->slots + RADIX_TREE_MAP_SIZE))
+		return ptr;
+	return NULL;
 }
 #else
-static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
+static inline void **get_sibling_entry(struct radix_tree_node *parent, void *node)
 {
-	return false;
+	return NULL;
 }
 #endif
+#define is_sibling_entry(parent, node) (get_sibling_entry(parent,node) != NULL)
 
 static inline unsigned long get_slot_offset(struct radix_tree_node *parent,
 						 void **slot)
@@ -105,10 +107,10 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
-		if (siboff < RADIX_TREE_MAP_SIZE) {
-			offset = siboff;
-			entry = rcu_dereference_raw(parent->slots[offset]);
+		void **sibentry = get_sibling_entry(parent, entry);
+		if (sibentry) {
+			offset = get_slot_offset(parent, sibentry);
+			entry = rcu_dereference_raw(*sibentry);
 		}
 	}
 #endif

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-25 19:04           ` Linus Torvalds
@ 2016-09-25 19:40               ` Cedric Blancher
  2016-09-25 19:56             ` Linus Torvalds
  1 sibling, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-25 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ross Zwisler, Matthew Wilcox, Konstantin Khlebnikov,
	Andrew Morton, Kirill A. Shutemov, linux-fsdevel, Linux MM,
	Linux Kernel Mailing List, Matthew Wilcox

LGTM, except that #define is_sibling_entry should be IS_SIBLING_ENTRY

Ced

On 25 September 2016 at 21:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Sep 25, 2016 at 11:04 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> The more I look at that particular piece of code, the less I like it. It's
>> buggy shit. It needs to be rewritten entirely too actually check for sibling
>> entries, not that ad-hoc arithmetic crap.
>
> Here's my attempt at cleaning the mess up.
>
> I'm not claiming it's perfect, but I think it's better. It gets rid of
> the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
> be inside the is_sibling_entry() logic instead. Which got renamed and
> made to actually return the main sibling. So now there is at least
> only *one* piece of code that does that range comparison, and I don't
> think there is any huge need to explain what's going on, because the
> "magic" is unconditional.
>
> Willy?
>
>                  Linus



-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-25 19:40               ` Cedric Blancher
  0 siblings, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-25 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ross Zwisler, Matthew Wilcox, Konstantin Khlebnikov,
	Andrew Morton, Kirill A. Shutemov, linux-fsdevel, Linux MM,
	Linux Kernel Mailing List, Matthew Wilcox

LGTM, except that #define is_sibling_entry should be IS_SIBLING_ENTRY

Ced

On 25 September 2016 at 21:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Sep 25, 2016 at 11:04 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> The more I look at that particular piece of code, the less I like it. It's
>> buggy shit. It needs to be rewritten entirely too actually check for sibling
>> entries, not that ad-hoc arithmetic crap.
>
> Here's my attempt at cleaning the mess up.
>
> I'm not claiming it's perfect, but I think it's better. It gets rid of
> the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
> be inside the is_sibling_entry() logic instead. Which got renamed and
> made to actually return the main sibling. So now there is at least
> only *one* piece of code that does that range comparison, and I don't
> think there is any huge need to explain what's going on, because the
> "magic" is unconditional.
>
> Willy?
>
>                  Linus



-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-25 19:04           ` Linus Torvalds
  2016-09-25 19:40               ` Cedric Blancher
@ 2016-09-25 19:56             ` Linus Torvalds
  2016-09-26 21:28               ` Matthew Wilcox
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2016-09-25 19:56 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Ross Zwisler, Matthew Wilcox, Konstantin Khlebnikov,
	Andrew Morton, Kirill A. Shutemov, linux-fsdevel, Linux MM,
	Linux Kernel Mailing List, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 1546 bytes --]

On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>        It gets rid of
> the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
> be inside the is_sibling_entry() logic instead. Which got renamed and
> made to actually return the main sibling.

Sadly, it looks like gcc generates bad code for this approach. Looks
like it ends up testing the resulting sibling pointer twice (because
we explicitly disable -fno-delete-null-pointer-checks in the kernel,
and we have no way to say "look, I know this pointer I'm returning is
non-null").

So a smaller patch that keeps the old boolean "is_sibling_entry()" but
then actually *uses* that inside radix_tree_descend() and then tries
to make the nasty cast to "void **" more legible by making it use a
temporary variable seems to be a reasonable balance.

At least I feel like I can still read the code, but admittedly by now
that may be because I've stared at those few lines so much that I feel
like I know what's going on. So maybe the code isn't actually any more
legible after all.

.. and unlike my previous patch, it actually generates better code
than the original (while still passing the fixed test-suite, of
course). The reason seems to be exactly that temporary variable,
allowing us to just do

        entry = rcu_dereference_raw(*sibentry);

rather than doing

        entry = rcu_dereference_raw(parent->slots[offset]);

with the re-computed offset.

So I think I'll commit this unless somebody screams.

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 771 bytes --]

 lib/radix-tree.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf7314141..91f0727e3cad 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -105,10 +105,10 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
-		if (siboff < RADIX_TREE_MAP_SIZE) {
-			offset = siboff;
-			entry = rcu_dereference_raw(parent->slots[offset]);
+		if (is_sibling_entry(parent, entry)) {
+			void **sibentry = (void **) entry_to_node(entry);
+			offset = get_slot_offset(parent, sibentry);
+			entry = rcu_dereference_raw(*sibentry);
 		}
 	}
 #endif

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-24 22:54             ` Linus Torvalds
  (?)
@ 2016-09-26  4:26             ` Ross Zwisler
  -1 siblings, 0 replies; 33+ messages in thread
From: Ross Zwisler @ 2016-09-26  4:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Matthew Wilcox, Matthew Wilcox,
	Andrew Morton, Konstantin Khlebnikov, Ross Zwisler,
	Linux Kernel Mailing List, linux-mm, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

On Saturday, September 24, 2016, Linus Torvalds <
torvalds@linux-foundation.org> wrote:

> On Sat, Sep 24, 2016 at 2:04 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com <javascript:;>> wrote:
> >
> > Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
> > It also converts shmem-with-huge-pages and hugetlb to them.
>
> Ok, so that code actually has a chance of being used. I guess we'll
> not remove it. But I *would* like this subtle issue to have a comment
> around that odd cast/and/mask thing.
>
>             Linus
>

My DAX PMD patches will use multi-order entries as well.

[-- Attachment #2: Type: text/html, Size: 969 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-25 19:56             ` Linus Torvalds
@ 2016-09-26 21:28               ` Matthew Wilcox
  2016-09-26 21:48                   ` Cedric Blancher
  0 siblings, 1 reply; 33+ messages in thread
From: Matthew Wilcox @ 2016-09-26 21:28 UTC (permalink / raw)
  To: Linus Torvalds, Cedric Blancher
  Cc: Ross Zwisler, Matthew Wilcox, Konstantin Khlebnikov,
	Andrew Morton, Kirill A. Shutemov, linux-fsdevel, Linux MM,
	Linux Kernel Mailing List

From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >        It gets rid of
> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
> > be inside the is_sibling_entry() logic instead. Which got renamed and
> > made to actually return the main sibling.
> 
> Sadly, it looks like gcc generates bad code for this approach. Looks
> like it ends up testing the resulting sibling pointer twice (because
> we explicitly disable -fno-delete-null-pointer-checks in the kernel,
> and we have no way to say "look, I know this pointer I'm returning is
> non-null").
> 
> So a smaller patch that keeps the old boolean "is_sibling_entry()" but
> then actually *uses* that inside radix_tree_descend() and then tries
> to make the nasty cast to "void **" more legible by making it use a
> temporary variable seems to be a reasonable balance.
> 
> At least I feel like I can still read the code, but admittedly by now
> that may be because I've stared at those few lines so much that I feel
> like I know what's going on. So maybe the code isn't actually any more
> legible after all.
> 
> .. and unlike my previous patch, it actually generates better code
> than the original (while still passing the fixed test-suite, of
> course). The reason seems to be exactly that temporary variable,
> allowing us to just do
> 
>         entry = rcu_dereference_raw(*sibentry);
> 
> rather than doing
> 
>         entry = rcu_dereference_raw(parent->slots[offset]);
> 
> with the re-computed offset.
> 
> So I think I'll commit this unless somebody screams.

Acked-by: Matthew Wilcox <mawilcox@microsoft.com>

I don't love it.  But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot.

(Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
  2016-09-26 21:28               ` Matthew Wilcox
@ 2016-09-26 21:48                   ` Cedric Blancher
  0 siblings, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-26 21:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Ross Zwisler, Matthew Wilcox,
	Konstantin Khlebnikov, Andrew Morton, Kirill A. Shutemov,
	linux-fsdevel, Linux MM, Linux Kernel Mailing List

You might also try to use valid, plain ISO C99 instead of perverted
gcc extensions which only cause a lot of trouble in the long run.

Ced

On 26 September 2016 at 23:28, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
>> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >        It gets rid of
>> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
>> > be inside the is_sibling_entry() logic instead. Which got renamed and
>> > made to actually return the main sibling.
>>
>> Sadly, it looks like gcc generates bad code for this approach. Looks
>> like it ends up testing the resulting sibling pointer twice (because
>> we explicitly disable -fno-delete-null-pointer-checks in the kernel,
>> and we have no way to say "look, I know this pointer I'm returning is
>> non-null").
>>
>> So a smaller patch that keeps the old boolean "is_sibling_entry()" but
>> then actually *uses* that inside radix_tree_descend() and then tries
>> to make the nasty cast to "void **" more legible by making it use a
>> temporary variable seems to be a reasonable balance.
>>
>> At least I feel like I can still read the code, but admittedly by now
>> that may be because I've stared at those few lines so much that I feel
>> like I know what's going on. So maybe the code isn't actually any more
>> legible after all.
>>
>> .. and unlike my previous patch, it actually generates better code
>> than the original (while still passing the fixed test-suite, of
>> course). The reason seems to be exactly that temporary variable,
>> allowing us to just do
>>
>>         entry = rcu_dereference_raw(*sibentry);
>>
>> rather than doing
>>
>>         entry = rcu_dereference_raw(parent->slots[offset]);
>>
>> with the re-computed offset.
>>
>> So I think I'll commit this unless somebody screams.
>
> Acked-by: Matthew Wilcox <mawilcox@microsoft.com>
>
> I don't love it.  But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot.
>
> (Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).



-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] radix-tree: Fix optimisation problem
@ 2016-09-26 21:48                   ` Cedric Blancher
  0 siblings, 0 replies; 33+ messages in thread
From: Cedric Blancher @ 2016-09-26 21:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Ross Zwisler, Matthew Wilcox,
	Konstantin Khlebnikov, Andrew Morton, Kirill A. Shutemov,
	linux-fsdevel, Linux MM, Linux Kernel Mailing List

You might also try to use valid, plain ISO C99 instead of perverted
gcc extensions which only cause a lot of trouble in the long run.

Ced

On 26 September 2016 at 23:28, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
>> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >        It gets rid of
>> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
>> > be inside the is_sibling_entry() logic instead. Which got renamed and
>> > made to actually return the main sibling.
>>
>> Sadly, it looks like gcc generates bad code for this approach. Looks
>> like it ends up testing the resulting sibling pointer twice (because
>> we explicitly disable -fno-delete-null-pointer-checks in the kernel,
>> and we have no way to say "look, I know this pointer I'm returning is
>> non-null").
>>
>> So a smaller patch that keeps the old boolean "is_sibling_entry()" but
>> then actually *uses* that inside radix_tree_descend() and then tries
>> to make the nasty cast to "void **" more legible by making it use a
>> temporary variable seems to be a reasonable balance.
>>
>> At least I feel like I can still read the code, but admittedly by now
>> that may be because I've stared at those few lines so much that I feel
>> like I know what's going on. So maybe the code isn't actually any more
>> legible after all.
>>
>> .. and unlike my previous patch, it actually generates better code
>> than the original (while still passing the fixed test-suite, of
>> course). The reason seems to be exactly that temporary variable,
>> allowing us to just do
>>
>>         entry = rcu_dereference_raw(*sibentry);
>>
>> rather than doing
>>
>>         entry = rcu_dereference_raw(parent->slots[offset]);
>>
>> with the re-computed offset.
>>
>> So I think I'll commit this unless somebody screams.
>
> Acked-by: Matthew Wilcox <mawilcox@microsoft.com>
>
> I don't love it.  But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot.
>
> (Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).



-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-09-26 21:48 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 18:53 [PATCH 0/2] Fix radix_tree_lookup_slot() Matthew Wilcox
2016-09-22 18:53 ` Matthew Wilcox
2016-09-22 18:53 ` [PATCH 1/2] radix tree test suite: Test radix_tree_replace_slot() for multiorder entries Matthew Wilcox
2016-09-22 18:53   ` Matthew Wilcox
2016-09-22 18:53 ` [PATCH 2/2] radix-tree: Fix optimisation problem Matthew Wilcox
2016-09-22 18:53   ` Matthew Wilcox
2016-09-22 18:09   ` Linus Torvalds
2016-09-22 18:09     ` Linus Torvalds
2016-09-23 20:16     ` Matthew Wilcox
2016-09-24 20:21       ` Linus Torvalds
2016-09-24 20:21         ` Linus Torvalds
2016-09-24 20:47         ` Linus Torvalds
2016-09-24 21:04         ` Kirill A. Shutemov
2016-09-24 21:04           ` Kirill A. Shutemov
2016-09-24 22:54           ` Linus Torvalds
2016-09-24 22:54             ` Linus Torvalds
2016-09-26  4:26             ` Ross Zwisler
2016-09-24  8:36   ` Konstantin Khlebnikov
2016-09-24  8:36     ` Konstantin Khlebnikov
2016-09-24 23:35   ` Cedric Blancher
2016-09-24 23:35     ` Cedric Blancher
2016-09-25  0:18     ` Linus Torvalds
2016-09-25  0:18       ` Linus Torvalds
2016-09-25 17:59       ` Cedric Blancher
2016-09-25 17:59         ` Cedric Blancher
2016-09-25 18:04         ` Linus Torvalds
2016-09-25 19:04           ` Linus Torvalds
2016-09-25 19:40             ` Cedric Blancher
2016-09-25 19:40               ` Cedric Blancher
2016-09-25 19:56             ` Linus Torvalds
2016-09-26 21:28               ` Matthew Wilcox
2016-09-26 21:48                 ` Cedric Blancher
2016-09-26 21:48                   ` Cedric Blancher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.