All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] memory subsystem: cache memory blocks in radix tree to accelerate lookup
@ 2019-11-20 19:25 Scott Cheloha
  2019-11-21  9:34 ` David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Scott Cheloha @ 2019-11-20 19:25 UTC (permalink / raw)
  To: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, David Hildenbrand
  Cc: nathanl, ricklind, Scott Cheloha

Searching for a particular memory block by id is slow because each block
device is kept in an unsorted linked list on the subsystem bus.

Lookup is much faster if we cache the blocks in a radix tree.  Memory
subsystem initialization and hotplug/hotunplug is at least a little faster
for any machine with more than ~100 blocks, and the speedup grows with
the block count.

Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
---
On a 40GB Power9 VM I'm seeing nice initialization speed improvements
consistent with the change in data structure.  Here's the per-block
timings before the patch:

# uptime        elapsed         block-id
0.005121        0.000033        0
0.005154        0.000028        1
0.005182        0.000035        2
0.005217        0.000030        3
0.005247        0.000039        4
0.005286        0.000031        5
0.005317        0.000030        6
0.005347        0.000031        7
0.005378        0.000030        8
0.005408        0.000031        9
[...]
0.091603        0.000143        999
0.091746        0.000175        1000
0.091921        0.000143        1001
0.092064        0.000142        1002
0.092206        0.000143        1003
0.092349        0.000143        1004
0.092492        0.000143        1005
0.092635        0.000144        1006
0.092779        0.000143        1007
0.092922        0.000144        1008
[...]
0.301879        0.000258        2038
0.302137        0.000267        2039
0.302404        0.000291        2040
0.302695        0.000259        2041
0.302954        0.000258        2042
0.303212        0.000259        2043
0.303471        0.000260        2044
0.303731        0.000258        2045
0.303989        0.000259        2046
0.304248        0.000260        2047

Obviously a linear growth with each block.

With the patch:

# uptime        elapsed         block-id
0.004701        0.000029        0
0.004730        0.000028        1
0.004758        0.000028        2
0.004786        0.000027        3
0.004813        0.000037        4
0.004850        0.000027        5
0.004877        0.000027        6
0.004904        0.000027        7
0.004931        0.000026        8
0.004957        0.000027        9
[...]
0.032718        0.000027        999
0.032745        0.000027        1000
0.032772        0.000026        1001
0.032798        0.000027        1002
0.032825        0.000027        1003
0.032852        0.000026        1004
0.032878        0.000027        1005
0.032905        0.000027        1006
0.032932        0.000026        1007
0.032958        0.000027        1008
[...]
0.061148        0.000027        2038
0.061175        0.000027        2039
0.061202        0.000026        2040
0.061228        0.000027        2041
0.061255        0.000027        2042
0.061282        0.000026        2043
0.061308        0.000027        2044
0.061335        0.000026        2045
0.061361        0.000026        2046
0.061387        0.000027        2047

It flattens out.

I'm seeing similar changes on my development PC but the numbers are
less drastic because the block size on a PC grows with the amount
of memory.  On powerpc the gains are a lot more visible because the
block size tops out at 256MB.

 drivers/base/memory.c | 53 ++++++++++++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 84c4e1f72cbd..fc0a4880c321 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -20,6 +20,7 @@
 #include <linux/memory_hotplug.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/radix-tree.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
 
@@ -59,6 +60,13 @@ static struct bus_type memory_subsys = {
 	.offline = memory_subsys_offline,
 };
 
+/*
+ * Memory blocks are cached in a local radix tree to avoid
+ * a costly linear search for the corresponding device on
+ * the subsystem bus.
+ */
+static RADIX_TREE(memory_block_tree, GFP_KERNEL);
+
 static BLOCKING_NOTIFIER_HEAD(memory_chain);
 
 int register_memory_notifier(struct notifier_block *nb)
@@ -580,20 +588,14 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
 /* A reference for the returned memory block device is acquired. */
 static struct memory_block *find_memory_block_by_id(unsigned long block_id)
 {
-	struct device *dev;
+	struct memory_block *mem;
 
-	dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
-	return dev ? to_memory_block(dev) : NULL;
+	mem = radix_tree_lookup(&memory_block_tree, block_id);
+	if (mem)
+		get_device(&mem->dev);
+	return mem;
 }
 
-/*
- * For now, we have a linear search to go find the appropriate
- * memory_block corresponding to a particular phys_index. If
- * this gets to be a real problem, we can always use a radix
- * tree or something here.
- *
- * This could be made generic for all device subsystems.
- */
 struct memory_block *find_memory_block(struct mem_section *section)
 {
 	unsigned long block_id = base_memory_block_id(__section_nr(section));
@@ -636,9 +638,15 @@ int register_memory(struct memory_block *memory)
 	memory->dev.offline = memory->state == MEM_OFFLINE;
 
 	ret = device_register(&memory->dev);
-	if (ret)
+	if (ret) {
 		put_device(&memory->dev);
-
+		return ret;
+	}
+	ret = radix_tree_insert(&memory_block_tree, memory->dev.id, memory);
+	if (ret) {
+		put_device(&memory->dev);
+		device_unregister(&memory->dev);
+	}
 	return ret;
 }
 
@@ -696,6 +704,8 @@ static void unregister_memory(struct memory_block *memory)
 	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
 		return;
 
+	WARN_ON(radix_tree_delete(&memory_block_tree, memory->dev.id) == NULL);
+
 	/* drop the ref. we got via find_memory_block() */
 	put_device(&memory->dev);
 	device_unregister(&memory->dev);
@@ -851,20 +861,21 @@ void __init memory_dev_init(void)
 int walk_memory_blocks(unsigned long start, unsigned long size,
 		       void *arg, walk_memory_blocks_func_t func)
 {
-	const unsigned long start_block_id = phys_to_block_id(start);
-	const unsigned long end_block_id = phys_to_block_id(start + size - 1);
+	struct radix_tree_iter iter;
+	const unsigned long start_id = phys_to_block_id(start);
+	const unsigned long end_id = phys_to_block_id(start + size - 1);
 	struct memory_block *mem;
-	unsigned long block_id;
+	void **slot;
 	int ret = 0;
 
 	if (!size)
 		return 0;
 
-	for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
-		mem = find_memory_block_by_id(block_id);
-		if (!mem)
-			continue;
-
+	radix_tree_for_each_slot(slot, &memory_block_tree, &iter, start_id) {
+		mem = radix_tree_deref_slot(slot);
+		if (mem->dev.id > end_id)
+			break;
+		get_device(&mem->dev);
 		ret = func(mem, arg);
 		put_device(&mem->dev);
 		if (ret)
-- 
2.24.0.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] memory subsystem: cache memory blocks in radix tree to accelerate lookup
  2019-11-20 19:25 [PATCH] memory subsystem: cache memory blocks in radix tree to accelerate lookup Scott Cheloha
@ 2019-11-21  9:34 ` David Hildenbrand
  2019-11-21  9:35 ` David Hildenbrand
  2019-11-21 19:59 ` [PATCH v2] drivers/base/memory.c: cache " Scott Cheloha
  2 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2019-11-21  9:34 UTC (permalink / raw)
  To: Scott Cheloha, linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman
  Cc: nathanl, ricklind

On 20.11.19 20:25, Scott Cheloha wrote:
> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.
> 
> Lookup is much faster if we cache the blocks in a radix tree.  Memory
> subsystem initialization and hotplug/hotunplug is at least a little faster
> for any machine with more than ~100 blocks, and the speedup grows with
> the block count.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> ---
> On a 40GB Power9 VM I'm seeing nice initialization speed improvements
> consistent with the change in data structure.  Here's the per-block
> timings before the patch:
> 
> # uptime        elapsed         block-id
> 0.005121        0.000033        0
> 0.005154        0.000028        1
> 0.005182        0.000035        2
> 0.005217        0.000030        3
> 0.005247        0.000039        4
> 0.005286        0.000031        5
> 0.005317        0.000030        6
> 0.005347        0.000031        7
> 0.005378        0.000030        8
> 0.005408        0.000031        9
> [...]
> 0.091603        0.000143        999
> 0.091746        0.000175        1000
> 0.091921        0.000143        1001
> 0.092064        0.000142        1002
> 0.092206        0.000143        1003
> 0.092349        0.000143        1004
> 0.092492        0.000143        1005
> 0.092635        0.000144        1006
> 0.092779        0.000143        1007
> 0.092922        0.000144        1008
> [...]
> 0.301879        0.000258        2038
> 0.302137        0.000267        2039
> 0.302404        0.000291        2040
> 0.302695        0.000259        2041
> 0.302954        0.000258        2042
> 0.303212        0.000259        2043
> 0.303471        0.000260        2044
> 0.303731        0.000258        2045
> 0.303989        0.000259        2046
> 0.304248        0.000260        2047
> 
> Obviously a linear growth with each block.
> 
> With the patch:
> 
> # uptime        elapsed         block-id
> 0.004701        0.000029        0
> 0.004730        0.000028        1
> 0.004758        0.000028        2
> 0.004786        0.000027        3
> 0.004813        0.000037        4
> 0.004850        0.000027        5
> 0.004877        0.000027        6
> 0.004904        0.000027        7
> 0.004931        0.000026        8
> 0.004957        0.000027        9
> [...]
> 0.032718        0.000027        999
> 0.032745        0.000027        1000
> 0.032772        0.000026        1001
> 0.032798        0.000027        1002
> 0.032825        0.000027        1003
> 0.032852        0.000026        1004
> 0.032878        0.000027        1005
> 0.032905        0.000027        1006
> 0.032932        0.000026        1007
> 0.032958        0.000027        1008
> [...]
> 0.061148        0.000027        2038
> 0.061175        0.000027        2039
> 0.061202        0.000026        2040
> 0.061228        0.000027        2041
> 0.061255        0.000027        2042
> 0.061282        0.000026        2043
> 0.061308        0.000027        2044
> 0.061335        0.000026        2045
> 0.061361        0.000026        2046
> 0.061387        0.000027        2047
> 
> It flattens out.
> 
> I'm seeing similar changes on my development PC but the numbers are
> less drastic because the block size on a PC grows with the amount
> of memory.  On powerpc the gains are a lot more visible because the
> block size tops out at 256MB.

One could argue that this is only needed for bigger machines. If we ever 
care, we could defer filling/using the tree once a certain block count 
is reached. Future work if we really care.

> 
>   drivers/base/memory.c | 53 ++++++++++++++++++++++++++-----------------
>   1 file changed, 32 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 84c4e1f72cbd..fc0a4880c321 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -20,6 +20,7 @@
>   #include <linux/memory_hotplug.h>
>   #include <linux/mm.h>
>   #include <linux/mutex.h>
> +#include <linux/radix-tree.h>
>   #include <linux/stat.h>
>   #include <linux/slab.h>
>   
> @@ -59,6 +60,13 @@ static struct bus_type memory_subsys = {
>   	.offline = memory_subsys_offline,
>   };
>   
> +/*
> + * Memory blocks are cached in a local radix tree to avoid
> + * a costly linear search for the corresponding device on
> + * the subsystem bus.
> + */
> +static RADIX_TREE(memory_block_tree, GFP_KERNEL);

simply memory_blocks or memory_block_devices?

> +
>   static BLOCKING_NOTIFIER_HEAD(memory_chain);
>   
>   int register_memory_notifier(struct notifier_block *nb)
> @@ -580,20 +588,14 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
>   /* A reference for the returned memory block device is acquired. */
>   static struct memory_block *find_memory_block_by_id(unsigned long block_id)
>   {
> -	struct device *dev;
> +	struct memory_block *mem;
>   
> -	dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
> -	return dev ? to_memory_block(dev) : NULL;
> +	mem = radix_tree_lookup(&memory_block_tree, block_id);
> +	if (mem)
> +		get_device(&mem->dev);
> +	return mem;
>   }
>   
> -/*
> - * For now, we have a linear search to go find the appropriate
> - * memory_block corresponding to a particular phys_index. If
> - * this gets to be a real problem, we can always use a radix
> - * tree or something here.
> - *
> - * This could be made generic for all device subsystems.
> - */

As this really only makes sense for subsystems with a lot of devices, I 
think it is the right thing to do to keep it local in here for now.

>   struct memory_block *find_memory_block(struct mem_section *section)
>   {
>   	unsigned long block_id = base_memory_block_id(__section_nr(section));
> @@ -636,9 +638,15 @@ int register_memory(struct memory_block *memory)
>   	memory->dev.offline = memory->state == MEM_OFFLINE;
>   
>   	ret = device_register(&memory->dev);
> -	if (ret)
> +	if (ret) {
>   		put_device(&memory->dev);
> -
> +		return ret;
> +	}
> +	ret = radix_tree_insert(&memory_block_tree, memory->dev.id, memory);
> +	if (ret) {
> +		put_device(&memory->dev);
> +		device_unregister(&memory->dev);
> +	}
>   	return ret;
>   }
>   
> @@ -696,6 +704,8 @@ static void unregister_memory(struct memory_block *memory)
>   	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>   		return;
>   
> +	WARN_ON(radix_tree_delete(&memory_block_tree, memory->dev.id) == NULL);
> +
>   	/* drop the ref. we got via find_memory_block() */
>   	put_device(&memory->dev);
>   	device_unregister(&memory->dev);
> @@ -851,20 +861,21 @@ void __init memory_dev_init(void)
>   int walk_memory_blocks(unsigned long start, unsigned long size,
>   		       void *arg, walk_memory_blocks_func_t func)
>   {
> -	const unsigned long start_block_id = phys_to_block_id(start);
> -	const unsigned long end_block_id = phys_to_block_id(start + size - 1);
> +	struct radix_tree_iter iter;
> +	const unsigned long start_id = phys_to_block_id(start);
> +	const unsigned long end_id = phys_to_block_id(start + size - 1);

Please don't rename these two variables, unrelated change.

>   	struct memory_block *mem;
> -	unsigned long block_id;
> +	void **slot;
>   	int ret = 0;
>   
>   	if (!size)
>   		return 0;
>   
> -	for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
> -		mem = find_memory_block_by_id(block_id);
> -		if (!mem)
> -			continue;
> -
> +	radix_tree_for_each_slot(slot, &memory_block_tree, &iter, start_id) {
> +		mem = radix_tree_deref_slot(slot);
> +		if (mem->dev.id > end_id)
> +			break;

I think you could do a

if (iter.index > end_id)
	break;
mem = radix_tree_deref_slot(slot);

instead, which would be nicer IMHO.

> +		get_device(&mem->dev);
>   		ret = func(mem, arg);
>   		put_device(&mem->dev);

I think we can stop doing the get/put here (similar as in 
for_each_memory_block() now), this function should only be called with 
the device hotplug lock held (or early during boot), where no concurrent 
hot(un)plug can happen. I suggest this change as an addon patch.

>   		if (ret)
> 

With the changes

Acked-by: David Hildenbrand <david@redhat.com>

Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] memory subsystem: cache memory blocks in radix tree to accelerate lookup
  2019-11-20 19:25 [PATCH] memory subsystem: cache memory blocks in radix tree to accelerate lookup Scott Cheloha
  2019-11-21  9:34 ` David Hildenbrand
@ 2019-11-21  9:35 ` David Hildenbrand
  2019-11-21 19:59 ` [PATCH v2] drivers/base/memory.c: cache " Scott Cheloha
  2 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2019-11-21  9:35 UTC (permalink / raw)
  To: Scott Cheloha, linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman
  Cc: nathanl, ricklind

On 20.11.19 20:25, Scott Cheloha wrote:
> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.

Oh, and you patch subject should be

"drivers/base/memory.c: cache memory [...]"

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-11-20 19:25 [PATCH] memory subsystem: cache memory blocks in radix tree to accelerate lookup Scott Cheloha
  2019-11-21  9:34 ` David Hildenbrand
  2019-11-21  9:35 ` David Hildenbrand
@ 2019-11-21 19:59 ` Scott Cheloha
  2019-11-25  6:36     ` kbuild test robot
  2019-12-17 19:32   ` [PATCH v3] " Scott Cheloha
  2 siblings, 2 replies; 24+ messages in thread
From: Scott Cheloha @ 2019-11-21 19:59 UTC (permalink / raw)
  To: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, David Hildenbrand
  Cc: nathanl, ricklind, Scott Cheloha

Searching for a particular memory block by id is slow because each block
device is kept in an unsorted linked list on the subsystem bus.

Lookup is much faster if we cache the blocks in a radix tree.  Memory
subsystem initialization and hotplug/hotunplug is at least a little faster
for any machine with more than ~100 blocks, and the speedup grows with
the block count.

Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
v2 incorporates suggestions from David Hildenbrand.

Removing the get_device/put_device dance from walk_memory_blocks() can
be done in a subsequent patch.
 drivers/base/memory.c | 49 ++++++++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 84c4e1f72cbd..32e74fc1b0ac 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -20,6 +20,7 @@
 #include <linux/memory_hotplug.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/radix-tree.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
 
@@ -59,6 +60,13 @@ static struct bus_type memory_subsys = {
 	.offline = memory_subsys_offline,
 };
 
+/*
+ * Memory blocks are cached in a local radix tree to avoid
+ * a costly linear search for the corresponding device on
+ * the subsystem bus.
+ */
+static RADIX_TREE(memory_blocks, GFP_KERNEL);
+
 static BLOCKING_NOTIFIER_HEAD(memory_chain);
 
 int register_memory_notifier(struct notifier_block *nb)
@@ -580,20 +588,14 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
 /* A reference for the returned memory block device is acquired. */
 static struct memory_block *find_memory_block_by_id(unsigned long block_id)
 {
-	struct device *dev;
+	struct memory_block *mem;
 
-	dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
-	return dev ? to_memory_block(dev) : NULL;
+	mem = radix_tree_lookup(&memory_blocks, block_id);
+	if (mem)
+		get_device(&mem->dev);
+	return mem;
 }
 
-/*
- * For now, we have a linear search to go find the appropriate
- * memory_block corresponding to a particular phys_index. If
- * this gets to be a real problem, we can always use a radix
- * tree or something here.
- *
- * This could be made generic for all device subsystems.
- */
 struct memory_block *find_memory_block(struct mem_section *section)
 {
 	unsigned long block_id = base_memory_block_id(__section_nr(section));
@@ -636,9 +638,15 @@ int register_memory(struct memory_block *memory)
 	memory->dev.offline = memory->state == MEM_OFFLINE;
 
 	ret = device_register(&memory->dev);
-	if (ret)
+	if (ret) {
 		put_device(&memory->dev);
-
+		return ret;
+	}
+	ret = radix_tree_insert(&memory_blocks, memory->dev.id, memory);
+	if (ret) {
+		put_device(&memory->dev);
+		device_unregister(&memory->dev);
+	}
 	return ret;
 }
 
@@ -696,6 +704,8 @@ static void unregister_memory(struct memory_block *memory)
 	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
 		return;
 
+	WARN_ON(radix_tree_delete(&memory_blocks, memory->dev.id) == NULL);
+
 	/* drop the ref. we got via find_memory_block() */
 	put_device(&memory->dev);
 	device_unregister(&memory->dev);
@@ -851,20 +861,21 @@ void __init memory_dev_init(void)
 int walk_memory_blocks(unsigned long start, unsigned long size,
 		       void *arg, walk_memory_blocks_func_t func)
 {
+	struct radix_tree_iter iter;
 	const unsigned long start_block_id = phys_to_block_id(start);
 	const unsigned long end_block_id = phys_to_block_id(start + size - 1);
 	struct memory_block *mem;
-	unsigned long block_id;
+	void **slot;
 	int ret = 0;
 
 	if (!size)
 		return 0;
 
-	for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
-		mem = find_memory_block_by_id(block_id);
-		if (!mem)
-			continue;
-
+	radix_tree_for_each_slot(slot, &memory_blocks, &iter, start_block_id) {
+		if (iter.index > end_block_id)
+			break;
+		mem = radix_tree_deref_slot(slot);
+		get_device(&mem->dev);
 		ret = func(mem, arg);
 		put_device(&mem->dev);
 		if (ret)
-- 
2.24.0.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-11-21 19:59 ` [PATCH v2] drivers/base/memory.c: cache " Scott Cheloha
@ 2019-11-25  6:36     ` kbuild test robot
  2019-12-17 19:32   ` [PATCH v3] " Scott Cheloha
  1 sibling, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2019-11-25  6:36 UTC (permalink / raw)
  To: Scott Cheloha
  Cc: kbuild-all, linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman,
	David Hildenbrand, nathanl, ricklind, Scott Cheloha

Hi Scott,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on v5.4]
[cannot apply to next-20191122]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 0e4a459f56c32d3e52ae69a4b447db2f48a65f44
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-36-g9305d48-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> drivers/base/memory.c:874:9: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected void **slot @@    got void [noderef] <asvoid **slot @@
>> drivers/base/memory.c:874:9: sparse:    expected void **slot
>> drivers/base/memory.c:874:9: sparse:    got void [noderef] <asn:4> **
>> drivers/base/memory.c:874:9: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected void **slot @@    got void [noderef] <asvoid **slot @@
>> drivers/base/memory.c:874:9: sparse:    expected void **slot
>> drivers/base/memory.c:874:9: sparse:    got void [noderef] <asn:4> **
>> drivers/base/memory.c:877:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected void [noderef] <asn:4> **slot @@    got n:4> **slot @@
>> drivers/base/memory.c:877:45: sparse:    expected void [noderef] <asn:4> **slot
>> drivers/base/memory.c:877:45: sparse:    got void **slot
   drivers/base/memory.c:874:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected void [noderef] <asn:4> **slot @@    got n:4> **slot @@
   drivers/base/memory.c:874:9: sparse:    expected void [noderef] <asn:4> **slot
   drivers/base/memory.c:874:9: sparse:    got void **slot
>> drivers/base/memory.c:874:9: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected void **slot @@    got void [noderef] <asvoid **slot @@
>> drivers/base/memory.c:874:9: sparse:    expected void **slot
>> drivers/base/memory.c:874:9: sparse:    got void [noderef] <asn:4> **

vim +874 drivers/base/memory.c

   845	
   846	/**
   847	 * walk_memory_blocks - walk through all present memory blocks overlapped
   848	 *			by the range [start, start + size)
   849	 *
   850	 * @start: start address of the memory range
   851	 * @size: size of the memory range
   852	 * @arg: argument passed to func
   853	 * @func: callback for each memory section walked
   854	 *
   855	 * This function walks through all present memory blocks overlapped by the
   856	 * range [start, start + size), calling func on each memory block.
   857	 *
   858	 * In case func() returns an error, walking is aborted and the error is
   859	 * returned.
   860	 */
   861	int walk_memory_blocks(unsigned long start, unsigned long size,
   862			       void *arg, walk_memory_blocks_func_t func)
   863	{
   864		struct radix_tree_iter iter;
   865		const unsigned long start_block_id = phys_to_block_id(start);
   866		const unsigned long end_block_id = phys_to_block_id(start + size - 1);
   867		struct memory_block *mem;
   868		void **slot;
   869		int ret = 0;
   870	
   871		if (!size)
   872			return 0;
   873	
 > 874		radix_tree_for_each_slot(slot, &memory_blocks, &iter, start_block_id) {
   875			if (iter.index > end_block_id)
   876				break;
 > 877			mem = radix_tree_deref_slot(slot);

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
@ 2019-11-25  6:36     ` kbuild test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2019-11-25  6:36 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4149 bytes --]

Hi Scott,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on v5.4]
[cannot apply to next-20191122]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 0e4a459f56c32d3e52ae69a4b447db2f48a65f44
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-36-g9305d48-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> drivers/base/memory.c:874:9: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected void **slot @@    got void [noderef] <asvoid **slot @@
>> drivers/base/memory.c:874:9: sparse:    expected void **slot
>> drivers/base/memory.c:874:9: sparse:    got void [noderef] <asn:4> **
>> drivers/base/memory.c:874:9: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected void **slot @@    got void [noderef] <asvoid **slot @@
>> drivers/base/memory.c:874:9: sparse:    expected void **slot
>> drivers/base/memory.c:874:9: sparse:    got void [noderef] <asn:4> **
>> drivers/base/memory.c:877:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected void [noderef] <asn:4> **slot @@    got n:4> **slot @@
>> drivers/base/memory.c:877:45: sparse:    expected void [noderef] <asn:4> **slot
>> drivers/base/memory.c:877:45: sparse:    got void **slot
   drivers/base/memory.c:874:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected void [noderef] <asn:4> **slot @@    got n:4> **slot @@
   drivers/base/memory.c:874:9: sparse:    expected void [noderef] <asn:4> **slot
   drivers/base/memory.c:874:9: sparse:    got void **slot
>> drivers/base/memory.c:874:9: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected void **slot @@    got void [noderef] <asvoid **slot @@
>> drivers/base/memory.c:874:9: sparse:    expected void **slot
>> drivers/base/memory.c:874:9: sparse:    got void [noderef] <asn:4> **

vim +874 drivers/base/memory.c

   845	
   846	/**
   847	 * walk_memory_blocks - walk through all present memory blocks overlapped
   848	 *			by the range [start, start + size)
   849	 *
   850	 * @start: start address of the memory range
   851	 * @size: size of the memory range
   852	 * @arg: argument passed to func
   853	 * @func: callback for each memory section walked
   854	 *
   855	 * This function walks through all present memory blocks overlapped by the
   856	 * range [start, start + size), calling func on each memory block.
   857	 *
   858	 * In case func() returns an error, walking is aborted and the error is
   859	 * returned.
   860	 */
   861	int walk_memory_blocks(unsigned long start, unsigned long size,
   862			       void *arg, walk_memory_blocks_func_t func)
   863	{
   864		struct radix_tree_iter iter;
   865		const unsigned long start_block_id = phys_to_block_id(start);
   866		const unsigned long end_block_id = phys_to_block_id(start + size - 1);
   867		struct memory_block *mem;
   868		void **slot;
   869		int ret = 0;
   870	
   871		if (!size)
   872			return 0;
   873	
 > 874		radix_tree_for_each_slot(slot, &memory_blocks, &iter, start_block_id) {
   875			if (iter.index > end_block_id)
   876				break;
 > 877			mem = radix_tree_deref_slot(slot);

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-11-21 19:59 ` [PATCH v2] drivers/base/memory.c: cache " Scott Cheloha
  2019-11-25  6:36     ` kbuild test robot
@ 2019-12-17 19:32   ` Scott Cheloha
  2019-12-18  9:00     ` David Hildenbrand
                       ` (3 more replies)
  1 sibling, 4 replies; 24+ messages in thread
From: Scott Cheloha @ 2019-12-17 19:32 UTC (permalink / raw)
  To: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, David Hildenbrand
  Cc: nathanl, ricklind, Scott Cheloha

Searching for a particular memory block by id is slow because each block
device is kept in an unsorted linked list on the subsystem bus.

Lookup is much faster if we cache the blocks in a radix tree.  Memory
subsystem initialization and hotplug/hotunplug is at least a little faster
for any machine with more than ~100 blocks, and the speedup grows with
the block count.

Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
v2 incorporates suggestions from David Hildenbrand.

v3 changes:
  - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"

  - Be conservative: don't use radix_tree_for_each_slot() in
    walk_memory_blocks() yet.  It introduces RCU which could
    change behavior.  Walking the tree "by hand" with
    find_memory_block_by_id() is slower but keeps the patch
    simple.

 drivers/base/memory.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 799b43191dea..8902930d5ef2 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -19,6 +19,7 @@
 #include <linux/memory.h>
 #include <linux/memory_hotplug.h>
 #include <linux/mm.h>
+#include <linux/radix-tree.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
 
@@ -56,6 +57,13 @@ static struct bus_type memory_subsys = {
 	.offline = memory_subsys_offline,
 };
 
+/*
+ * Memory blocks are cached in a local radix tree to avoid
+ * a costly linear search for the corresponding device on
+ * the subsystem bus.
+ */
+static RADIX_TREE(memory_blocks, GFP_KERNEL);
+
 static BLOCKING_NOTIFIER_HEAD(memory_chain);
 
 int register_memory_notifier(struct notifier_block *nb)
@@ -572,20 +580,14 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
 /* A reference for the returned memory block device is acquired. */
 static struct memory_block *find_memory_block_by_id(unsigned long block_id)
 {
-	struct device *dev;
+	struct memory_block *mem;
 
-	dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
-	return dev ? to_memory_block(dev) : NULL;
+	mem = radix_tree_lookup(&memory_blocks, block_id);
+	if (mem)
+		get_device(&mem->dev);
+	return mem;
 }
 
-/*
- * For now, we have a linear search to go find the appropriate
- * memory_block corresponding to a particular phys_index. If
- * this gets to be a real problem, we can always use a radix
- * tree or something here.
- *
- * This could be made generic for all device subsystems.
- */
 struct memory_block *find_memory_block(struct mem_section *section)
 {
 	unsigned long block_id = base_memory_block_id(__section_nr(section));
@@ -628,9 +630,15 @@ int register_memory(struct memory_block *memory)
 	memory->dev.offline = memory->state == MEM_OFFLINE;
 
 	ret = device_register(&memory->dev);
-	if (ret)
+	if (ret) {
 		put_device(&memory->dev);
-
+		return ret;
+	}
+	ret = radix_tree_insert(&memory_blocks, memory->dev.id, memory);
+	if (ret) {
+		put_device(&memory->dev);
+		device_unregister(&memory->dev);
+	}
 	return ret;
 }
 
@@ -688,6 +696,8 @@ static void unregister_memory(struct memory_block *memory)
 	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
 		return;
 
+	WARN_ON(radix_tree_delete(&memory_blocks, memory->dev.id) == NULL);
+
 	/* drop the ref. we got via find_memory_block() */
 	put_device(&memory->dev);
 	device_unregister(&memory->dev);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-12-17 19:32   ` [PATCH v3] " Scott Cheloha
@ 2019-12-18  9:00     ` David Hildenbrand
  2019-12-19 17:33       ` Scott Cheloha
  2019-12-19 18:09     ` Nathan Lynch
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2019-12-18  9:00 UTC (permalink / raw)
  To: Scott Cheloha, linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman
  Cc: nathanl, ricklind

On 17.12.19 20:32, Scott Cheloha wrote:
> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.
> 
> Lookup is much faster if we cache the blocks in a radix tree.  Memory
> subsystem initialization and hotplug/hotunplug is at least a little faster
> for any machine with more than ~100 blocks, and the speedup grows with
> the block count.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> ---
> v2 incorporates suggestions from David Hildenbrand.
> 
> v3 changes:
>   - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
> 
>   - Be conservative: don't use radix_tree_for_each_slot() in
>     walk_memory_blocks() yet.  It introduces RCU which could
>     change behavior.  Walking the tree "by hand" with
>     find_memory_block_by_id() is slower but keeps the patch
>     simple.

Fine with me (splitting it out, e.g., into an addon patch), however, as
readers/writers run mutually exclusive, there is nothing to worry about
here. RCU will not make a difference.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-12-18  9:00     ` David Hildenbrand
@ 2019-12-19 17:33       ` Scott Cheloha
  2019-12-20 10:50         ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Scott Cheloha @ 2019-12-19 17:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, nathanl, ricklind

On Wed, Dec 18, 2019 at 10:00:49AM +0100, David Hildenbrand wrote:
> On 17.12.19 20:32, Scott Cheloha wrote:
> > Searching for a particular memory block by id is slow because each block
> > device is kept in an unsorted linked list on the subsystem bus.
> > 
> > Lookup is much faster if we cache the blocks in a radix tree.  Memory
> > subsystem initialization and hotplug/hotunplug is at least a little faster
> > for any machine with more than ~100 blocks, and the speedup grows with
> > the block count.
> > 
> > Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > ---
> > v2 incorporates suggestions from David Hildenbrand.
> > 
> > v3 changes:
> >   - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
> > 
> >   - Be conservative: don't use radix_tree_for_each_slot() in
> >     walk_memory_blocks() yet.  It introduces RCU which could
> >     change behavior.  Walking the tree "by hand" with
> >     find_memory_block_by_id() is slower but keeps the patch
> >     simple.
> 
> Fine with me (splitting it out, e.g., into an addon patch), however, as
> readers/writers run mutually exclusive, there is nothing to worry about
> here. RCU will not make a difference.

Oh.  In that case, can you make heads or tails of this CI failure
email I got about the v2 patch?  I've inlined it at the end of this
mail.  "Suspicious RCU usage".  Unclear if it's a false positive.  My
thinking was that I'd hold off on using radix_tree_for_each_slot() and
introducing a possible regression until I understood what was
triggering the robot.

Also, is there anyone else I should shop this patch to?  I copied the
maintainers reported by scripts/get_maintainer.pl but are there others
who might be interested?

--

Here's that CI mail.  I've stripped the attached config because it's
huge.

Date: Sun, 1 Dec 2019 23:05:23 +0800
From: kernel test robot <lkp@intel.com>
To: Scott Cheloha <cheloha@linux.vnet.ibm.com>
Cc: 0day robot <lkp@intel.com>, LKP <lkp@lists.01.org>
Subject: 86dc301f7b ("drivers/base/memory.c: cache blocks in radix tree .."):
 [    1.341517] WARNING: suspicious RCU usage
Message-ID: <20191201150523.GE18573@shao2-debian>

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://github.com/0day-ci/linux/commits/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557

commit 86dc301f7b4815d90e3a7843ffed655d64efe445
Author:     Scott Cheloha <cheloha@linux.vnet.ibm.com>
AuthorDate: Thu Nov 21 13:59:52 2019 -0600
Commit:     0day robot <lkp@intel.com>
CommitDate: Sun Nov 24 10:45:59 2019 +0800

    drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
    
    Searching for a particular memory block by id is slow because each block
    device is kept in an unsorted linked list on the subsystem bus.
    
    Lookup is much faster if we cache the blocks in a radix tree.  Memory
    subsystem initialization and hotplug/hotunplug is at least a little faster
    for any machine with more than ~100 blocks, and the speedup grows with
    the block count.
    
    Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
    Acked-by: David Hildenbrand <david@redhat.com>

0e4a459f56  tracing: Remove unnecessary DEBUG_FS dependency
86dc301f7b  drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
+---------------------------------------------------------------------+------------+------------+
|                                                                     | 0e4a459f56 | 86dc301f7b |
+---------------------------------------------------------------------+------------+------------+
| boot_successes                                                      | 8          | 0          |
| boot_failures                                                       | 25         | 11         |
| WARNING:possible_circular_locking_dependency_detected               | 25         | 8          |
| WARNING:suspicious_RCU_usage                                        | 0          | 11         |
| include/linux/radix-tree.h:#suspicious_rcu_dereference_check()usage | 0          | 11         |
+---------------------------------------------------------------------+------------+------------+

If you fix the issue, kindly add following tag
Reported-by: kernel test robot <lkp@intel.com>

[    1.335279] random: get_random_bytes called from kcmp_cookies_init+0x29/0x4c with crng_init=0
[    1.336883] ACPI: bus type PCI registered
[    1.338295] PCI: Using configuration type 1 for base access
[    1.340735] 
[    1.341049] =============================
[    1.341517] WARNING: suspicious RCU usage
[    1.342266] 5.4.0-rc5-00070-g86dc301f7b481 #1 Tainted: G                T
[    1.343494] -----------------------------
[    1.344226] include/linux/radix-tree.h:167 suspicious rcu_dereference_check() usage!
[    1.345516] 
[    1.345516] other info that might help us debug this:
[    1.345516] 
[    1.346962] 
[    1.346962] rcu_scheduler_active = 2, debug_locks = 1
[    1.348134] no locks held by swapper/0/1.
[    1.348866] 
[    1.348866] stack backtrace:
[    1.349525] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                T 5.4.0-rc5-00070-g86dc301f7b481 #1
[    1.351230] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[    1.352720] Call Trace:
[    1.353187]  ? dump_stack+0x9a/0xde
[    1.353507]  ? node_access_release+0x19/0x19
[    1.353507]  ? walk_memory_blocks+0xe6/0x184
[    1.353507]  ? set_debug_rodata+0x20/0x20
[    1.353507]  ? link_mem_sections+0x39/0x3d
[    1.353507]  ? topology_init+0x74/0xc8
[    1.353507]  ? enable_cpu0_hotplug+0x15/0x15
[    1.353507]  ? do_one_initcall+0x13d/0x30a
[    1.353507]  ? kernel_init_freeable+0x18e/0x23b
[    1.353507]  ? rest_init+0x173/0x173
[    1.353507]  ? kernel_init+0x10/0x151
[    1.353507]  ? rest_init+0x173/0x173
[    1.353507]  ? ret_from_fork+0x3a/0x50
[    1.410829] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    1.427389] cryptd: max_cpu_qlen set to 1000
[    1.457792] ACPI: Added _OSI(Module Device)
[    1.458615] ACPI: Added _OSI(Processor Device)
[    1.459428] ACPI: Added _OSI(3.0 _SCP Extensions)

                                                          # HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 1d5f18079cb10420c4e4b67f571b998ec861a20c 219d54332a09e8d8741c1e1982f5eae56099de85 --
git bisect good 00e5729fc7309fd1da26659b7dce0fcc0b46ab7e  # 23:10  G     11     0    9   9  Merge 'linux-review/Daniel-W-S-Almeida/media-dummy_dvb_fe-register-adapter-frontend/20191127-043813' into devel-hourly-2019113015
git bisect  bad da795b1ad310518d9100105235c9ba3ff4ee2524  # 23:32  B      0    11   27   0  Merge 'linux-review/Krzysztof-Kozlowski/media-Fix-Kconfig-indentation/20191122-055026' into devel-hourly-2019113015
git bisect  bad 53f2832a6bf7103957815cef4ad58be22255ea7a  # 00:21  B      0     7   24   0  Merge 'linux-review/Luc-Van-Oostenryck/misc-xilinx_sdfec-add-missing-__user-annotation/20191124-052749' into devel-hourly-2019113015
git bisect good 03a04e32f16a2c4bd267b48e8ab4aaf71c507050  # 00:55  G     11     0    8   8  Merge 'linux-review/Helmut-Grohne/mdio_bus-revert-inadvertent-error-code-change/20191124-171049' into devel-hourly-2019113015
git bisect  bad 03403c59a591b12556bd0db1243eb0503112a0df  # 02:22  B      0    10   26   0  Merge 'linux-review/Navid-Emamdoost/EDAC-Fix-memory-leak-in-i5100_init_one/20191124-103621' into devel-hourly-2019113015
git bisect good fae5c4b30fda35cbcc8389f432dcaac20f9b3a12  # 03:02  G     11     0    7   7  Merge 'linux-review/David-Gow/kunit-Always-print-actual-pointer-values-in-asserts/20191124-131742' into devel-hourly-2019113015
git bisect  bad 1e0c725d2bb21789330a2fe1a360c37ae753eb18  # 04:01  B      0    10   26   0  Merge 'linux-review/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557' into devel-hourly-2019113015
git bisect good 0902b87758e830e857e11f1538e50b218743f4b0  # 04:29  G     10     0    7   7  Merge 'linux-review/Julio-Faracco/drivers-net-virtio_net-Implement-a-dev_watchdog-handler/20191124-135051' into devel-hourly-2019113015
git bisect good 2d0c894673b55b51915ea858eabcf7c97c9b8ccb  # 05:33  G     10     0    6   6  Merge 'linux-review/Maciej-enczykowski/net-ipv6-IPV6_TRANSPARENT-check-NET_RAW-prior-to-NET_ADMIN/20191124-120121' into devel-hourly-2019113015
git bisect good 3c095c856fb84271e56e1869aface37af1b078f1  # 05:58  G     11     0   10  10  Merge 'linux-review/Navid-Emamdoost/Bluetooth-Fix-memory-leak-in-hci_connect_le_scan/20191124-111255' into devel-hourly-2019113015
git bisect  bad 86dc301f7b4815d90e3a7843ffed655d64efe445  # 06:39  B      0    11   27   0  drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
# first bad commit: [86dc301f7b4815d90e3a7843ffed655d64efe445] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
git bisect good 0e4a459f56c32d3e52ae69a4b447db2f48a65f44  # 07:25  G     33     0   25  25  tracing: Remove unnecessary DEBUG_FS dependency
# extra tests with debug options
git bisect good 86dc301f7b4815d90e3a7843ffed655d64efe445  # 09:13  G     11     0    0   0  drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
# extra tests on head commit of linux-review/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557
git bisect  bad 86dc301f7b4815d90e3a7843ffed655d64efe445  # 09:16  B      0    11   27   0  drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
# bad: [86dc301f7b4815d90e3a7843ffed655d64efe445] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
# extra tests on revert first bad commit
git bisect good 65bba365eabe04a22e5d68012a45b92eee26860c  # 10:18  G     10     0    9   9  Revert "drivers/base/memory.c: cache blocks in radix tree to accelerate lookup"
# good: [65bba365eabe04a22e5d68012a45b92eee26860c] Revert "drivers/base/memory.c: cache blocks in radix tree to accelerate lookup"

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-12-17 19:32   ` [PATCH v3] " Scott Cheloha
  2019-12-18  9:00     ` David Hildenbrand
@ 2019-12-19 18:09     ` Nathan Lynch
  2020-01-07 21:48     ` Michal Hocko
  2020-01-09  9:48     ` Michal Hocko
  3 siblings, 0 replies; 24+ messages in thread
From: Nathan Lynch @ 2019-12-19 18:09 UTC (permalink / raw)
  To: Scott Cheloha
  Cc: ricklind, Scott Cheloha, linux-kernel, Rafael J. Wysocki,
	Greg Kroah-Hartman, David Hildenbrand

Scott Cheloha <cheloha@linux.vnet.ibm.com> writes:

> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.
>
> Lookup is much faster if we cache the blocks in a radix tree.  Memory
> subsystem initialization and hotplug/hotunplug is at least a little faster
> for any machine with more than ~100 blocks, and the speedup grows with
> the block count.
>
> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>

Acked-by: Nathan Lynch <nathanl@linux.ibm.com>

Thanks Scott.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-12-19 17:33       ` Scott Cheloha
@ 2019-12-20 10:50         ` David Hildenbrand
  0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2019-12-20 10:50 UTC (permalink / raw)
  To: Scott Cheloha
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman,
	Michal Hocko, Oscar Salvador, akpm

On 19.12.19 18:33, Scott Cheloha wrote:
> On Wed, Dec 18, 2019 at 10:00:49AM +0100, David Hildenbrand wrote:
>> On 17.12.19 20:32, Scott Cheloha wrote:
>>> Searching for a particular memory block by id is slow because each block
>>> device is kept in an unsorted linked list on the subsystem bus.
>>>
>>> Lookup is much faster if we cache the blocks in a radix tree.  Memory
>>> subsystem initialization and hotplug/hotunplug is at least a little faster
>>> for any machine with more than ~100 blocks, and the speedup grows with
>>> the block count.
>>>
>>> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> ---
>>> v2 incorporates suggestions from David Hildenbrand.
>>>
>>> v3 changes:
>>>   - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
>>>
>>>   - Be conservative: don't use radix_tree_for_each_slot() in
>>>     walk_memory_blocks() yet.  It introduces RCU which could
>>>     change behavior.  Walking the tree "by hand" with
>>>     find_memory_block_by_id() is slower but keeps the patch
>>>     simple.
>>
>> Fine with me (splitting it out, e.g., into an addon patch), however, as
>> readers/writers run mutually exclusive, there is nothing to worry about
>> here. RCU will not make a difference.
> 
> Oh.  In that case, can you make heads or tails of this CI failure
> email I got about the v2 patch?  I've inlined it at the end of this
> mail.  "Suspicious RCU usage".  Unclear if it's a false positive.  My
> thinking was that I'd hold off on using radix_tree_for_each_slot() and
> introducing a possible regression until I understood what was
> triggering the robot.

Ah, did not see that message. See below.

> 
> Also, is there anyone else I should shop this patch to?  I copied the
> maintainers reported by scripts/get_maintainer.pl but are there others
> who might be interested?

On memory hotplug (and related) things, it's usually a good idea to CC
Andrew (who picks up basically all MM stuff), Michal Hocko, and Oscar
Salvador. (cc-ing them)

> 
> --
> 
> Here's that CI mail.  I've stripped the attached config because it's
> huge.
> 
> Date: Sun, 1 Dec 2019 23:05:23 +0800
> From: kernel test robot <lkp@intel.com>
> To: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> Cc: 0day robot <lkp@intel.com>, LKP <lkp@lists.01.org>
> Subject: 86dc301f7b ("drivers/base/memory.c: cache blocks in radix tree .."):
>  [    1.341517] WARNING: suspicious RCU usage
> Message-ID: <20191201150523.GE18573@shao2-debian>
> 
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://github.com/0day-ci/linux/commits/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557
> 
> commit 86dc301f7b4815d90e3a7843ffed655d64efe445
> Author:     Scott Cheloha <cheloha@linux.vnet.ibm.com>
> AuthorDate: Thu Nov 21 13:59:52 2019 -0600
> Commit:     0day robot <lkp@intel.com>
> CommitDate: Sun Nov 24 10:45:59 2019 +0800
> 
>     drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
>     
>     Searching for a particular memory block by id is slow because each block
>     device is kept in an unsorted linked list on the subsystem bus.
>     
>     Lookup is much faster if we cache the blocks in a radix tree.  Memory
>     subsystem initialization and hotplug/hotunplug is at least a little faster
>     for any machine with more than ~100 blocks, and the speedup grows with
>     the block count.
>     
>     Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
>     Acked-by: David Hildenbrand <david@redhat.com>
> 
> 0e4a459f56  tracing: Remove unnecessary DEBUG_FS dependency
> 86dc301f7b  drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
> +---------------------------------------------------------------------+------------+------------+
> |                                                                     | 0e4a459f56 | 86dc301f7b |
> +---------------------------------------------------------------------+------------+------------+
> | boot_successes                                                      | 8          | 0          |
> | boot_failures                                                       | 25         | 11         |
> | WARNING:possible_circular_locking_dependency_detected               | 25         | 8          |
> | WARNING:suspicious_RCU_usage                                        | 0          | 11         |
> | include/linux/radix-tree.h:#suspicious_rcu_dereference_check()usage | 0          | 11         |
> +---------------------------------------------------------------------+------------+------------+
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <lkp@intel.com>
> 
> [    1.335279] random: get_random_bytes called from kcmp_cookies_init+0x29/0x4c with crng_init=0
> [    1.336883] ACPI: bus type PCI registered
> [    1.338295] PCI: Using configuration type 1 for base access
> [    1.340735] 
> [    1.341049] =============================
> [    1.341517] WARNING: suspicious RCU usage
> [    1.342266] 5.4.0-rc5-00070-g86dc301f7b481 #1 Tainted: G                T
> [    1.343494] -----------------------------
> [    1.344226] include/linux/radix-tree.h:167 suspicious rcu_dereference_check() usage!
> [    1.345516] 
> [    1.345516] other info that might help us debug this:
> [    1.345516] 
> [    1.346962] 
> [    1.346962] rcu_scheduler_active = 2, debug_locks = 1
> [    1.348134] no locks held by swapper/0/1.
> [    1.348866] 
> [    1.348866] stack backtrace:
> [    1.349525] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                T 5.4.0-rc5-00070-g86dc301f7b481 #1
> [    1.351230] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [    1.352720] Call Trace:
> [    1.353187]  ? dump_stack+0x9a/0xde
> [    1.353507]  ? node_access_release+0x19/0x19
> [    1.353507]  ? walk_memory_blocks+0xe6/0x184
> [    1.353507]  ? set_debug_rodata+0x20/0x20
> [    1.353507]  ? link_mem_sections+0x39/0x3d
> [    1.353507]  ? topology_init+0x74/0xc8
> [    1.353507]  ? enable_cpu0_hotplug+0x15/0x15
> [    1.353507]  ? do_one_initcall+0x13d/0x30a
> [    1.353507]  ? kernel_init_freeable+0x18e/0x23b
> [    1.353507]  ? rest_init+0x173/0x173
> [    1.353507]  ? kernel_init+0x10/0x151
> [    1.353507]  ? rest_init+0x173/0x173
> [    1.353507]  ? ret_from_fork+0x3a/0x50
> [    1.410829] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
> [    1.427389] cryptd: max_cpu_qlen set to 1000
> [    1.457792] ACPI: Added _OSI(Module Device)
> [    1.458615] ACPI: Added _OSI(Processor Device)
> [    1.459428] ACPI: Added _OSI(3.0 _SCP Extensions)
> 


We get a complaint that we do a rcu_dereference() without proper protection.

We come via

rest_init()->kernel_init()->kernel_init_freeable()->do_basic_setup()->
do_initcalls()->do_initcall_level()->do_one_initcall()->
topology_init()->register_one_node()->link_mem_sections()->
walk_memory_blocks()

E.g., we add ACPI memory similarly via
    ...do_initcalls()...acpi_init()->acpi_scan_init()
also without holding the device hotplug lock. AFAIK, we can't get
races/concurrent hot(un)plug activity here (we even documented that
for ACPI). And if we would, the code would already be wrong ;)

I assume the simplest and cheapest way to suppress the warning would be
to add rcu_read_lock()/rcu_read_unlock() around the
radix_tree_for_each_slot().

Another way to suppress the warning would be to take the device hotplug
lock before calling register_one_node() in all arch specific code - but
we had a similar discussion due to the ACPI code back then and decided
to not take the device hotplug lock if not really necessary.

... but after all we would want a radix_tree_for_each_slot() variant
that does not perform such checks here. We don't hold any locks and
don't need any locks.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-12-17 19:32   ` [PATCH v3] " Scott Cheloha
  2019-12-18  9:00     ` David Hildenbrand
  2019-12-19 18:09     ` Nathan Lynch
@ 2020-01-07 21:48     ` Michal Hocko
  2020-01-08 13:36       ` David Hildenbrand
  2020-01-09  8:49       ` Michal Hocko
  2020-01-09  9:48     ` Michal Hocko
  3 siblings, 2 replies; 24+ messages in thread
From: Michal Hocko @ 2020-01-07 21:48 UTC (permalink / raw)
  To: Scott Cheloha
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman,
	David Hildenbrand, nathanl, ricklind, Andrew Morton

[Cc Andrew]

On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.

Noting that this is O(N^2) would be useful.

> Lookup is much faster if we cache the blocks in a radix tree.

While this is really easy and straightforward, is there any reason why
subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
simply needed a more optimized data structure for that purpose yet.
Would it be too hard to use radix tree for all lookups rather than
adding a shadow copy for memblocks?

> Memory
> subsystem initialization and hotplug/hotunplug is at least a little faster
> for any machine with more than ~100 blocks, and the speedup grows with
> the block count.

If we are onlining one memblock after another and there are many of them
then this is going to help a some. Some numbers would be really nice.
If there are too many blocks then the biggest part of the overhead is
still there when registering each memblock though because that operation
is just very expensinve (e.g. udev notification).

> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> ---
> v2 incorporates suggestions from David Hildenbrand.
> 
> v3 changes:
>   - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
> 
>   - Be conservative: don't use radix_tree_for_each_slot() in
>     walk_memory_blocks() yet.  It introduces RCU which could
>     change behavior.  Walking the tree "by hand" with
>     find_memory_block_by_id() is slower but keeps the patch
>     simple.
> 
>  drivers/base/memory.c | 36 +++++++++++++++++++++++-------------
>  1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 799b43191dea..8902930d5ef2 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -19,6 +19,7 @@
>  #include <linux/memory.h>
>  #include <linux/memory_hotplug.h>
>  #include <linux/mm.h>
> +#include <linux/radix-tree.h>
>  #include <linux/stat.h>
>  #include <linux/slab.h>
>  
> @@ -56,6 +57,13 @@ static struct bus_type memory_subsys = {
>  	.offline = memory_subsys_offline,
>  };
>  
> +/*
> + * Memory blocks are cached in a local radix tree to avoid
> + * a costly linear search for the corresponding device on
> + * the subsystem bus.
> + */
> +static RADIX_TREE(memory_blocks, GFP_KERNEL);
> +
>  static BLOCKING_NOTIFIER_HEAD(memory_chain);
>  
>  int register_memory_notifier(struct notifier_block *nb)
> @@ -572,20 +580,14 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
>  /* A reference for the returned memory block device is acquired. */
>  static struct memory_block *find_memory_block_by_id(unsigned long block_id)
>  {
> -	struct device *dev;
> +	struct memory_block *mem;
>  
> -	dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
> -	return dev ? to_memory_block(dev) : NULL;
> +	mem = radix_tree_lookup(&memory_blocks, block_id);
> +	if (mem)
> +		get_device(&mem->dev);
> +	return mem;
>  }
>  
> -/*
> - * For now, we have a linear search to go find the appropriate
> - * memory_block corresponding to a particular phys_index. If
> - * this gets to be a real problem, we can always use a radix
> - * tree or something here.
> - *
> - * This could be made generic for all device subsystems.
> - */
>  struct memory_block *find_memory_block(struct mem_section *section)
>  {
>  	unsigned long block_id = base_memory_block_id(__section_nr(section));
> @@ -628,9 +630,15 @@ int register_memory(struct memory_block *memory)
>  	memory->dev.offline = memory->state == MEM_OFFLINE;
>  
>  	ret = device_register(&memory->dev);
> -	if (ret)
> +	if (ret) {
>  		put_device(&memory->dev);
> -
> +		return ret;
> +	}
> +	ret = radix_tree_insert(&memory_blocks, memory->dev.id, memory);
> +	if (ret) {
> +		put_device(&memory->dev);
> +		device_unregister(&memory->dev);
> +	}
>  	return ret;
>  }
>  
> @@ -688,6 +696,8 @@ static void unregister_memory(struct memory_block *memory)
>  	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>  		return;
>  
> +	WARN_ON(radix_tree_delete(&memory_blocks, memory->dev.id) == NULL);
> +
>  	/* drop the ref. we got via find_memory_block() */
>  	put_device(&memory->dev);
>  	device_unregister(&memory->dev);
> -- 
> 2.24.0
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-07 21:48     ` Michal Hocko
@ 2020-01-08 13:36       ` David Hildenbrand
  2020-01-08 14:21         ` Michal Hocko
  2020-01-09  8:49       ` Michal Hocko
  1 sibling, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2020-01-08 13:36 UTC (permalink / raw)
  To: Michal Hocko, Scott Cheloha
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, nathanl,
	ricklind, Andrew Morton

On 07.01.20 22:48, Michal Hocko wrote:
> [Cc Andrew]
> 
> On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
>> Searching for a particular memory block by id is slow because each block
>> device is kept in an unsorted linked list on the subsystem bus.
> 
> Noting that this is O(N^2) would be useful.
> 
>> Lookup is much faster if we cache the blocks in a radix tree.
> 
> While this is really easy and straightforward, is there any reason why
> subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> simply needed a more optimized data structure for that purpose yet.
> Would it be too hard to use radix tree for all lookups rather than
> adding a shadow copy for memblocks?

As reply to v1/v2 I argued that this is really only needed if there are
many devices. So far that seems to be applicable to the memory subsystem
mostly. No need to waste space on all other subsystems IMHO.

As you said, right now it's easy and straightforward, if we find out
other subsystems need it we can generalize/factor out.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-08 13:36       ` David Hildenbrand
@ 2020-01-08 14:21         ` Michal Hocko
  2020-01-08 15:23           ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Hocko @ 2020-01-08 14:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Scott Cheloha, linux-kernel, Rafael J. Wysocki,
	Greg Kroah-Hartman, nathanl, ricklind, Andrew Morton

On Wed 08-01-20 14:36:48, David Hildenbrand wrote:
> On 07.01.20 22:48, Michal Hocko wrote:
> > [Cc Andrew]
> > 
> > On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> >> Searching for a particular memory block by id is slow because each block
> >> device is kept in an unsorted linked list on the subsystem bus.
> > 
> > Noting that this is O(N^2) would be useful.
> > 
> >> Lookup is much faster if we cache the blocks in a radix tree.
> > 
> > While this is really easy and straightforward, is there any reason why
> > subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> > simply needed a more optimized data structure for that purpose yet.
> > Would it be too hard to use radix tree for all lookups rather than
> > adding a shadow copy for memblocks?
> 
> As reply to v1/v2 I argued that this is really only needed if there are
> many devices. So far that seems to be applicable to the memory subsystem
> mostly. No need to waste space on all other subsystems IMHO.

How much space are we talking about? Radix tree (resp. xarray) is a
small data structure and even when we have to allocate nodes dynamically
this doesn't sound like a huge overhead (especially with a small id
space). I might be missing something of course because I am not familiar
with this part the driver model and I would be interested what
maintainers think about that.

> As you said, right now it's easy and straightforward, if we find out
> other subsystems need it we can generalize/factor out.

I will not really push for that but it is almost always better to
improve a common infrastructure rather than build up a dedicated
workarouns in some users. Especially when there are no strong arguments
for that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-08 14:21         ` Michal Hocko
@ 2020-01-08 15:23           ` David Hildenbrand
  0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2020-01-08 15:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Scott Cheloha, linux-kernel, Rafael J. Wysocki,
	Greg Kroah-Hartman, nathanl, ricklind, Andrew Morton

On 08.01.20 15:21, Michal Hocko wrote:
> On Wed 08-01-20 14:36:48, David Hildenbrand wrote:
>> On 07.01.20 22:48, Michal Hocko wrote:
>>> [Cc Andrew]
>>>
>>> On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
>>>> Searching for a particular memory block by id is slow because each block
>>>> device is kept in an unsorted linked list on the subsystem bus.
>>>
>>> Noting that this is O(N^2) would be useful.
>>>
>>>> Lookup is much faster if we cache the blocks in a radix tree.
>>>
>>> While this is really easy and straightforward, is there any reason why
>>> subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
>>> simply needed a more optimized data structure for that purpose yet.
>>> Would it be too hard to use radix tree for all lookups rather than
>>> adding a shadow copy for memblocks?
>>
>> As reply to v1/v2 I argued that this is really only needed if there are
>> many devices. So far that seems to be applicable to the memory subsystem
>> mostly. No need to waste space on all other subsystems IMHO.
> 
> How much space are we talking about? Radix tree (resp. xarray) is a
> small data structure and even when we have to allocate nodes dynamically
> this doesn't sound like a huge overhead (especially with a small id
> space). I might be missing something of course because I am not familiar
> with this part the driver model and I would be interested what
> maintainers think about that.

It's still wasted space even if it's not necessary in the common case.

> 
>> As you said, right now it's easy and straightforward, if we find out
>> other subsystems need it we can generalize/factor out.
> 
> I will not really push for that but it is almost always better to
> improve a common infrastructure rather than build up a dedicated
> workarouns in some users. Especially when there are no strong arguments
> for that.

Yes, if it's worth for the common case :)

I don't really care in the and either, however, this seems to be the
easiest solution for now - IMHO.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-07 21:48     ` Michal Hocko
  2020-01-08 13:36       ` David Hildenbrand
@ 2020-01-09  8:49       ` Michal Hocko
  2020-01-09  8:56         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 24+ messages in thread
From: Michal Hocko @ 2020-01-09  8:49 UTC (permalink / raw)
  To: Rafael J. Wysocki, Greg Kroah-Hartman
  Cc: linux-kernel, Scott Cheloha, David Hildenbrand, nathanl,
	ricklind, Andrew Morton

On Tue 07-01-20 22:48:04, Michal Hocko wrote:
> [Cc Andrew]
> 
> On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> > Searching for a particular memory block by id is slow because each block
> > device is kept in an unsorted linked list on the subsystem bus.
> 
> Noting that this is O(N^2) would be useful.
> 
> > Lookup is much faster if we cache the blocks in a radix tree.
> 
> While this is really easy and straightforward, is there any reason why
> subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> simply needed a more optimized data structure for that purpose yet.
> Would it be too hard to use radix tree for all lookups rather than
> adding a shadow copy for memblocks?

Greg, Rafael, this seems to be your domain. Do you have any opinion on
this?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-09  8:49       ` Michal Hocko
@ 2020-01-09  8:56         ` Greg Kroah-Hartman
  2020-01-09  9:19           ` Michal Hocko
  0 siblings, 1 reply; 24+ messages in thread
From: Greg Kroah-Hartman @ 2020-01-09  8:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rafael J. Wysocki, linux-kernel, Scott Cheloha,
	David Hildenbrand, nathanl, ricklind, Andrew Morton

On Thu, Jan 09, 2020 at 09:49:55AM +0100, Michal Hocko wrote:
> On Tue 07-01-20 22:48:04, Michal Hocko wrote:
> > [Cc Andrew]
> > 
> > On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> > > Searching for a particular memory block by id is slow because each block
> > > device is kept in an unsorted linked list on the subsystem bus.
> > 
> > Noting that this is O(N^2) would be useful.
> > 
> > > Lookup is much faster if we cache the blocks in a radix tree.
> > 
> > While this is really easy and straightforward, is there any reason why
> > subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> > simply needed a more optimized data structure for that purpose yet.
> > Would it be too hard to use radix tree for all lookups rather than
> > adding a shadow copy for memblocks?
> 
> Greg, Rafael, this seems to be your domain. Do you have any opinion on
> this?

No one has cared about the speed of that call as it has never been on
any "fast path" that I know of.  And it should just be O(N), isn't it
just walking the list of devices in order?

If the "memory subsystem" wants a faster lookup for their objects,
there's nothing stopping you from using your own data structure for the
pointers to the objects if you want.  Just be careful about the lifetime
rules.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-09  8:56         ` Greg Kroah-Hartman
@ 2020-01-09  9:19           ` Michal Hocko
  2020-01-09  9:24             ` David Hildenbrand
                               ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Michal Hocko @ 2020-01-09  9:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, linux-kernel, Scott Cheloha,
	David Hildenbrand, nathanl, ricklind, Andrew Morton

On Thu 09-01-20 09:56:23, Greg KH wrote:
> On Thu, Jan 09, 2020 at 09:49:55AM +0100, Michal Hocko wrote:
> > On Tue 07-01-20 22:48:04, Michal Hocko wrote:
> > > [Cc Andrew]
> > > 
> > > On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> > > > Searching for a particular memory block by id is slow because each block
> > > > device is kept in an unsorted linked list on the subsystem bus.
> > > 
> > > Noting that this is O(N^2) would be useful.
> > > 
> > > > Lookup is much faster if we cache the blocks in a radix tree.
> > > 
> > > While this is really easy and straightforward, is there any reason why
> > > subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> > > simply needed a more optimized data structure for that purpose yet.
> > > Would it be too hard to use radix tree for all lookups rather than
> > > adding a shadow copy for memblocks?
> > 
> > Greg, Rafael, this seems to be your domain. Do you have any opinion on
> > this?
> 
> No one has cared about the speed of that call as it has never been on
> any "fast path" that I know of.  And it should just be O(N), isn't it
> just walking the list of devices in order?

Which means that if you have to call it N times then it is O(N^2) and
that is the case here because you are adding N memblocks. See
memory_dev_init
  for each memblock
    add_memory_block
      init_memory_block
        find_memory_block_by_id # checks all existing devices
        register_memory
	  device_register # add new device
  
In this particular case find_memory_block_by_id is called mostly to make
sure we are no re-registering something multiple times which shouldn't
happen so it sucks to spend a lot of time on that. We might think of
removing that for boot time but who knows what kind of surprises we
might see from crazy HW setups.
 
> If the "memory subsystem" wants a faster lookup for their objects,
> there's nothing stopping you from using your own data structure for the
> pointers to the objects if you want.  Just be careful about the lifetime
> rules.

The main question is whether replacing the linked list with a radix tree
in the generic code is something more meaningful.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-09  9:19           ` Michal Hocko
@ 2020-01-09  9:24             ` David Hildenbrand
  2020-01-09  9:31             ` David Hildenbrand
  2020-01-09  9:33             ` Greg Kroah-Hartman
  2 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2020-01-09  9:24 UTC (permalink / raw)
  To: Michal Hocko, Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, linux-kernel, Scott Cheloha, nathanl,
	ricklind, Andrew Morton

On 09.01.20 10:19, Michal Hocko wrote:
> On Thu 09-01-20 09:56:23, Greg KH wrote:
>> On Thu, Jan 09, 2020 at 09:49:55AM +0100, Michal Hocko wrote:
>>> On Tue 07-01-20 22:48:04, Michal Hocko wrote:
>>>> [Cc Andrew]
>>>>
>>>> On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
>>>>> Searching for a particular memory block by id is slow because each block
>>>>> device is kept in an unsorted linked list on the subsystem bus.
>>>>
>>>> Noting that this is O(N^2) would be useful.
>>>>
>>>>> Lookup is much faster if we cache the blocks in a radix tree.
>>>>
>>>> While this is really easy and straightforward, is there any reason why
>>>> subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
>>>> simply needed a more optimized data structure for that purpose yet.
>>>> Would it be too hard to use radix tree for all lookups rather than
>>>> adding a shadow copy for memblocks?
>>>
>>> Greg, Rafael, this seems to be your domain. Do you have any opinion on
>>> this?
>>
>> No one has cared about the speed of that call as it has never been on
>> any "fast path" that I know of.  And it should just be O(N), isn't it
>> just walking the list of devices in order?
> 
> Which means that if you have to call it N times then it is O(N^2) and
> that is the case here because you are adding N memblocks. See
> memory_dev_init
>   for each memblock
>     add_memory_block
>       init_memory_block
>         find_memory_block_by_id # checks all existing devices
>         register_memory
> 	  device_register # add new device
>   
> In this particular case find_memory_block_by_id is called mostly to make
> sure we are no re-registering something multiple times which shouldn't
> happen so it sucks to spend a lot of time on that. We might think of
> removing that for boot time but who knows what kind of surprises we
> might see from crazy HW setups.
>  
>> If the "memory subsystem" wants a faster lookup for their objects,
>> there's nothing stopping you from using your own data structure for the
>> pointers to the objects if you want.  Just be careful about the lifetime
>> rules.
> 
> The main question is whether replacing the linked list with a radix tree
> in the generic code is something more meaningful.
> 

Please note that there are a total of 2 (!) users of
subsys_find_device_by_id() only ... makes me wonder if we should get rid
of  subsys_find_device_by_id() instead and handle it only in the caller
directly. Rewriting the core (list->tree with all the locking logic)
sounds to me like an unnecessary rework.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-09  9:19           ` Michal Hocko
  2020-01-09  9:24             ` David Hildenbrand
@ 2020-01-09  9:31             ` David Hildenbrand
  2020-01-09  9:41               ` Greg Kroah-Hartman
  2020-01-09  9:33             ` Greg Kroah-Hartman
  2 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2020-01-09  9:31 UTC (permalink / raw)
  To: Michal Hocko, Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, linux-kernel, Scott Cheloha, nathanl,
	ricklind, Andrew Morton

On 09.01.20 10:19, Michal Hocko wrote:
> On Thu 09-01-20 09:56:23, Greg KH wrote:
>> On Thu, Jan 09, 2020 at 09:49:55AM +0100, Michal Hocko wrote:
>>> On Tue 07-01-20 22:48:04, Michal Hocko wrote:
>>>> [Cc Andrew]
>>>>
>>>> On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
>>>>> Searching for a particular memory block by id is slow because each block
>>>>> device is kept in an unsorted linked list on the subsystem bus.
>>>>
>>>> Noting that this is O(N^2) would be useful.
>>>>
>>>>> Lookup is much faster if we cache the blocks in a radix tree.
>>>>
>>>> While this is really easy and straightforward, is there any reason why
>>>> subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
>>>> simply needed a more optimized data structure for that purpose yet.
>>>> Would it be too hard to use radix tree for all lookups rather than
>>>> adding a shadow copy for memblocks?
>>>
>>> Greg, Rafael, this seems to be your domain. Do you have any opinion on
>>> this?
>>
>> No one has cared about the speed of that call as it has never been on
>> any "fast path" that I know of.  And it should just be O(N), isn't it
>> just walking the list of devices in order?
> 
> Which means that if you have to call it N times then it is O(N^2) and
> that is the case here because you are adding N memblocks. See
> memory_dev_init
>   for each memblock
>     add_memory_block
>       init_memory_block
>         find_memory_block_by_id # checks all existing devices
>         register_memory
> 	  device_register # add new device
>   
> In this particular case find_memory_block_by_id is called mostly to make
> sure we are no re-registering something multiple times which shouldn't
> happen so it sucks to spend a lot of time on that. We might think of
> removing that for boot time but who knows what kind of surprises we
> might see from crazy HW setups.

Oh, and please note (as discussed in v1 or v2 of this patch as well)
that the lookup is also performed in walk_memory_blocks() for each
memory block in the range, e.g., via link_mem_sections() on system boot.
There we have O(N^2) as well.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-09  9:19           ` Michal Hocko
  2020-01-09  9:24             ` David Hildenbrand
  2020-01-09  9:31             ` David Hildenbrand
@ 2020-01-09  9:33             ` Greg Kroah-Hartman
  2020-01-09  9:50               ` Michal Hocko
  2 siblings, 1 reply; 24+ messages in thread
From: Greg Kroah-Hartman @ 2020-01-09  9:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rafael J. Wysocki, linux-kernel, Scott Cheloha,
	David Hildenbrand, nathanl, ricklind, Andrew Morton

On Thu, Jan 09, 2020 at 10:19:34AM +0100, Michal Hocko wrote:
> On Thu 09-01-20 09:56:23, Greg KH wrote:
> > On Thu, Jan 09, 2020 at 09:49:55AM +0100, Michal Hocko wrote:
> > > On Tue 07-01-20 22:48:04, Michal Hocko wrote:
> > > > [Cc Andrew]
> > > > 
> > > > On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> > > > > Searching for a particular memory block by id is slow because each block
> > > > > device is kept in an unsorted linked list on the subsystem bus.
> > > > 
> > > > Noting that this is O(N^2) would be useful.
> > > > 
> > > > > Lookup is much faster if we cache the blocks in a radix tree.
> > > > 
> > > > While this is really easy and straightforward, is there any reason why
> > > > subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> > > > simply needed a more optimized data structure for that purpose yet.
> > > > Would it be too hard to use radix tree for all lookups rather than
> > > > adding a shadow copy for memblocks?
> > > 
> > > Greg, Rafael, this seems to be your domain. Do you have any opinion on
> > > this?
> > 
> > No one has cared about the speed of that call as it has never been on
> > any "fast path" that I know of.  And it should just be O(N), isn't it
> > just walking the list of devices in order?
> 
> Which means that if you have to call it N times then it is O(N^2) and
> that is the case here because you are adding N memblocks. See
> memory_dev_init
>   for each memblock
>     add_memory_block
>       init_memory_block
>         find_memory_block_by_id # checks all existing devices
>         register_memory
> 	  device_register # add new device
>   
> In this particular case find_memory_block_by_id is called mostly to make
> sure we are no re-registering something multiple times which shouldn't
> happen so it sucks to spend a lot of time on that. We might think of
> removing that for boot time but who knows what kind of surprises we
> might see from crazy HW setups.

Ok, so this is a self-inflicted issue, not a driver core issue :)

> > If the "memory subsystem" wants a faster lookup for their objects,
> > there's nothing stopping you from using your own data structure for the
> > pointers to the objects if you want.  Just be careful about the lifetime
> > rules.
> 
> The main question is whether replacing the linked list with a radix tree
> in the generic code is something more meaningful.

I strongly doubt it, it looks like you all are doing something very
specific to your subsystem that would need this type of speed/lookup.  I
suggest doing it on your own for now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-09  9:31             ` David Hildenbrand
@ 2020-01-09  9:41               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 24+ messages in thread
From: Greg Kroah-Hartman @ 2020-01-09  9:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, Rafael J. Wysocki, linux-kernel, Scott Cheloha,
	nathanl, ricklind, Andrew Morton

On Thu, Jan 09, 2020 at 10:31:21AM +0100, David Hildenbrand wrote:
> On 09.01.20 10:19, Michal Hocko wrote:
> > On Thu 09-01-20 09:56:23, Greg KH wrote:
> >> On Thu, Jan 09, 2020 at 09:49:55AM +0100, Michal Hocko wrote:
> >>> On Tue 07-01-20 22:48:04, Michal Hocko wrote:
> >>>> [Cc Andrew]
> >>>>
> >>>> On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> >>>>> Searching for a particular memory block by id is slow because each block
> >>>>> device is kept in an unsorted linked list on the subsystem bus.
> >>>>
> >>>> Noting that this is O(N^2) would be useful.
> >>>>
> >>>>> Lookup is much faster if we cache the blocks in a radix tree.
> >>>>
> >>>> While this is really easy and straightforward, is there any reason why
> >>>> subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> >>>> simply needed a more optimized data structure for that purpose yet.
> >>>> Would it be too hard to use radix tree for all lookups rather than
> >>>> adding a shadow copy for memblocks?
> >>>
> >>> Greg, Rafael, this seems to be your domain. Do you have any opinion on
> >>> this?
> >>
> >> No one has cared about the speed of that call as it has never been on
> >> any "fast path" that I know of.  And it should just be O(N), isn't it
> >> just walking the list of devices in order?
> > 
> > Which means that if you have to call it N times then it is O(N^2) and
> > that is the case here because you are adding N memblocks. See
> > memory_dev_init
> >   for each memblock
> >     add_memory_block
> >       init_memory_block
> >         find_memory_block_by_id # checks all existing devices
> >         register_memory
> > 	  device_register # add new device
> >   
> > In this particular case find_memory_block_by_id is called mostly to make
> > sure we are no re-registering something multiple times which shouldn't
> > happen so it sucks to spend a lot of time on that. We might think of
> > removing that for boot time but who knows what kind of surprises we
> > might see from crazy HW setups.
> 
> Oh, and please note (as discussed in v1 or v2 of this patch as well)
> that the lookup is also performed in walk_memory_blocks() for each
> memory block in the range, e.g., via link_mem_sections() on system boot.
> There we have O(N^2) as well.

Ok, again self-inflicted, I suggest you all roll your own logic for this
highly accessed set of things :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2019-12-17 19:32   ` [PATCH v3] " Scott Cheloha
                       ` (2 preceding siblings ...)
  2020-01-07 21:48     ` Michal Hocko
@ 2020-01-09  9:48     ` Michal Hocko
  3 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2020-01-09  9:48 UTC (permalink / raw)
  To: Scott Cheloha
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman,
	David Hildenbrand, nathanl, ricklind, Andrew Morton

On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.
> 
> Lookup is much faster if we cache the blocks in a radix tree.  Memory
> subsystem initialization and hotplug/hotunplug is at least a little faster
> for any machine with more than ~100 blocks, and the speedup grows with
> the block count.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>

OK, Greg doesn't see demand for abstracting a faster lookup into the
core so making a memory subsystem thing sounds like the way to go.
Please add more information about time savings into the changelog and
then feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

Do not forget to add Andrew to the Cc when resubmitting.

Thanks!

> ---
> v2 incorporates suggestions from David Hildenbrand.
> 
> v3 changes:
>   - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
> 
>   - Be conservative: don't use radix_tree_for_each_slot() in
>     walk_memory_blocks() yet.  It introduces RCU which could
>     change behavior.  Walking the tree "by hand" with
>     find_memory_block_by_id() is slower but keeps the patch
>     simple.
> 
>  drivers/base/memory.c | 36 +++++++++++++++++++++++-------------
>  1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 799b43191dea..8902930d5ef2 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -19,6 +19,7 @@
>  #include <linux/memory.h>
>  #include <linux/memory_hotplug.h>
>  #include <linux/mm.h>
> +#include <linux/radix-tree.h>
>  #include <linux/stat.h>
>  #include <linux/slab.h>
>  
> @@ -56,6 +57,13 @@ static struct bus_type memory_subsys = {
>  	.offline = memory_subsys_offline,
>  };
>  
> +/*
> + * Memory blocks are cached in a local radix tree to avoid
> + * a costly linear search for the corresponding device on
> + * the subsystem bus.
> + */
> +static RADIX_TREE(memory_blocks, GFP_KERNEL);
> +
>  static BLOCKING_NOTIFIER_HEAD(memory_chain);
>  
>  int register_memory_notifier(struct notifier_block *nb)
> @@ -572,20 +580,14 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
>  /* A reference for the returned memory block device is acquired. */
>  static struct memory_block *find_memory_block_by_id(unsigned long block_id)
>  {
> -	struct device *dev;
> +	struct memory_block *mem;
>  
> -	dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
> -	return dev ? to_memory_block(dev) : NULL;
> +	mem = radix_tree_lookup(&memory_blocks, block_id);
> +	if (mem)
> +		get_device(&mem->dev);
> +	return mem;
>  }
>  
> -/*
> - * For now, we have a linear search to go find the appropriate
> - * memory_block corresponding to a particular phys_index. If
> - * this gets to be a real problem, we can always use a radix
> - * tree or something here.
> - *
> - * This could be made generic for all device subsystems.
> - */
>  struct memory_block *find_memory_block(struct mem_section *section)
>  {
>  	unsigned long block_id = base_memory_block_id(__section_nr(section));
> @@ -628,9 +630,15 @@ int register_memory(struct memory_block *memory)
>  	memory->dev.offline = memory->state == MEM_OFFLINE;
>  
>  	ret = device_register(&memory->dev);
> -	if (ret)
> +	if (ret) {
>  		put_device(&memory->dev);
> -
> +		return ret;
> +	}
> +	ret = radix_tree_insert(&memory_blocks, memory->dev.id, memory);
> +	if (ret) {
> +		put_device(&memory->dev);
> +		device_unregister(&memory->dev);
> +	}
>  	return ret;
>  }
>  
> @@ -688,6 +696,8 @@ static void unregister_memory(struct memory_block *memory)
>  	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>  		return;
>  
> +	WARN_ON(radix_tree_delete(&memory_blocks, memory->dev.id) == NULL);
> +
>  	/* drop the ref. we got via find_memory_block() */
>  	put_device(&memory->dev);
>  	device_unregister(&memory->dev);
> -- 
> 2.24.0
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
  2020-01-09  9:33             ` Greg Kroah-Hartman
@ 2020-01-09  9:50               ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2020-01-09  9:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, linux-kernel, Scott Cheloha,
	David Hildenbrand, nathanl, ricklind, Andrew Morton

On Thu 09-01-20 10:33:59, Greg KH wrote:
> On Thu, Jan 09, 2020 at 10:19:34AM +0100, Michal Hocko wrote:
> > On Thu 09-01-20 09:56:23, Greg KH wrote:
> > > On Thu, Jan 09, 2020 at 09:49:55AM +0100, Michal Hocko wrote:
> > > > On Tue 07-01-20 22:48:04, Michal Hocko wrote:
> > > > > [Cc Andrew]
> > > > > 
> > > > > On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> > > > > > Searching for a particular memory block by id is slow because each block
> > > > > > device is kept in an unsorted linked list on the subsystem bus.
> > > > > 
> > > > > Noting that this is O(N^2) would be useful.
> > > > > 
> > > > > > Lookup is much faster if we cache the blocks in a radix tree.
> > > > > 
> > > > > While this is really easy and straightforward, is there any reason why
> > > > > subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
> > > > > simply needed a more optimized data structure for that purpose yet.
> > > > > Would it be too hard to use radix tree for all lookups rather than
> > > > > adding a shadow copy for memblocks?
> > > > 
> > > > Greg, Rafael, this seems to be your domain. Do you have any opinion on
> > > > this?
> > > 
> > > No one has cared about the speed of that call as it has never been on
> > > any "fast path" that I know of.  And it should just be O(N), isn't it
> > > just walking the list of devices in order?
> > 
> > Which means that if you have to call it N times then it is O(N^2) and
> > that is the case here because you are adding N memblocks. See
> > memory_dev_init
> >   for each memblock
> >     add_memory_block
> >       init_memory_block
> >         find_memory_block_by_id # checks all existing devices
> >         register_memory
> > 	  device_register # add new device
> >   
> > In this particular case find_memory_block_by_id is called mostly to make
> > sure we are no re-registering something multiple times which shouldn't
> > happen so it sucks to spend a lot of time on that. We might think of
> > removing that for boot time but who knows what kind of surprises we
> > might see from crazy HW setups.
> 
> Ok, so this is a self-inflicted issue, not a driver core issue :)
> 
> > > If the "memory subsystem" wants a faster lookup for their objects,
> > > there's nothing stopping you from using your own data structure for the
> > > pointers to the objects if you want.  Just be careful about the lifetime
> > > rules.
> > 
> > The main question is whether replacing the linked list with a radix tree
> > in the generic code is something more meaningful.
> 
> I strongly doubt it, it looks like you all are doing something very
> specific to your subsystem that would need this type of speed/lookup.  I
> suggest doing it on your own for now.

OK, fair enough.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-01-09  9:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-20 19:25 [PATCH] memory subsystem: cache memory blocks in radix tree to accelerate lookup Scott Cheloha
2019-11-21  9:34 ` David Hildenbrand
2019-11-21  9:35 ` David Hildenbrand
2019-11-21 19:59 ` [PATCH v2] drivers/base/memory.c: cache " Scott Cheloha
2019-11-25  6:36   ` kbuild test robot
2019-11-25  6:36     ` kbuild test robot
2019-12-17 19:32   ` [PATCH v3] " Scott Cheloha
2019-12-18  9:00     ` David Hildenbrand
2019-12-19 17:33       ` Scott Cheloha
2019-12-20 10:50         ` David Hildenbrand
2019-12-19 18:09     ` Nathan Lynch
2020-01-07 21:48     ` Michal Hocko
2020-01-08 13:36       ` David Hildenbrand
2020-01-08 14:21         ` Michal Hocko
2020-01-08 15:23           ` David Hildenbrand
2020-01-09  8:49       ` Michal Hocko
2020-01-09  8:56         ` Greg Kroah-Hartman
2020-01-09  9:19           ` Michal Hocko
2020-01-09  9:24             ` David Hildenbrand
2020-01-09  9:31             ` David Hildenbrand
2020-01-09  9:41               ` Greg Kroah-Hartman
2020-01-09  9:33             ` Greg Kroah-Hartman
2020-01-09  9:50               ` Michal Hocko
2020-01-09  9:48     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.