All of lore.kernel.org
 help / color / mirror / Atom feed
* [0/7, v9] NUMA Hotplug Emulator (v9)
@ 2010-12-10  7:31 ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh

* PATCHSET INTRODUCTION

patch 1: Documentation.
patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
		 as being possible for memory hotplug. 
	    
patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface

patch 4: Abstract cpu register functions, make these interface friend for cpu
		 hotplug emulation
patch 5: Support cpu probe/release in x86, it provide a software method to hot
		 add/remove cpu with sysfs interface.
patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
		 domain to build the incorrect hierarchy.
patch 7: Implement per-node add_memory debugfs interface

* FEEDBACKDS & RESPONSES

v9:

Solve the bug reported by Eric B Munson, check the return value of cpu_down when do
 CPU release.

Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5 based on his
patch.

Some small changes on debugfs per-node add_memory interface.

v8:

Reconsider David's proposal, accept the per-node add_memory interface on debugfs.
(p7).

v7:

David:    We don't need two different interfaces, one in sysfs and one in debugfs,
          to hotplug memory.
Response: We use the debugfs for memory hotplug emulation only, for sysfs memory probe
          interface, we did not do any modifications, so we remove original patch 7
		  from patchset.
David:    Suggest new probe files in debugfs for each online node:
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/node0/add_memory
			/sys/kernel/debug/mem_hotplug/node1/add_memory

Response: We need not make a simple thing such complicated, We'd prefer to
          rename the mem_hotplug/probe interface as mem_hotplug/add_memory.
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)

v6:

Greg KH:  Suggest to use interface mem_hotplug/add_node
David:    Agree with Greg's suggestion
Response: We move the interface from node/add_node to mem_hotplug/add_node, and we also move 
          memory/probe interface to mem_hotplug/probe since both are related to memory hotplug.

Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to patch 1/8.
Response: Move patch 8/8 to patch 1/8, and we will include the full description in 0/8 when
          we send patches in future.	    
       

v5:

David: Suggests to use a flexible method to to do node hotplug emulation. After
       review our 2 versions emulator implemetations, David provides a better solution
	   to solve both the flexibility and memory wasting issue. 
	   
	   Add numa=possible=<N> command line option, provide sysfs inteface
	   /sys/devices/system/node/add_node interface, and move the inteface to debugfs
	   /sys/kernel/debug/hotplug/add_node after hearing the voice from community.

Greg KH: move the interface from hotplug/add_node to node/add_node

Response: Accept David's node=possible=<n> command line options. After talking
       with David, he agree to add his patch to our patchset, thanks David's solution(patch 1).

	   David's original interface /sys/kernel/debug/hotplug/add_node is not so clear for
	   node hotplug emulation, we accept Greg's suggestion, move the interface to ndoe/add_node  
	   (patch 2)
		 
Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us to use configfs replace
       sysfs. After Dave knows that it is just for test purpose, Dave thinks debugfs should
	   be the best.

Response: memory probe sysfs interface already exists, I'd like to still keep it, and extend it
       to support memory add on a specified node(patch 6).

	   We accepts Dave's suggestion, implement memory probe interface with debugfs(patch 7).

Randy Dunlap: Correct many grammatical errors in our documentation(patch 8).

Response: Thanks for Randy's careful review, we already correct them. 

v4: 

Split CPU hotplug emulation code since David has send a patchset for node hotplug emulation.

v3 & v2:

1) Patch 0
Balbir & Greg: Suggest to use tool git/quilt to manage/send the patchset.
Response: Thanks for the recommendation, With help from Fengguang, I get quilt
		  working, it is a great tool.

2) Patch 2
Jaswinder Singh: if (hidden_num) is not required in patch 2
Response: good catching, it is removed in v2.


3) Patch 3
Dave Hansen: Suggest to create a dedicated sysfs file for each possible node.
Greg: 	  How big would this "list" be?  What will it look like exactly?
Haicheng: It should follow "one value per file". It intends to show acceptable
		  parameters.

		  For example, if we have 4 fake offlined nodes, like node 2-5, then:
			   $ cat /sys/devices/system/node/probe
				 2-5

		  Then user hotadds node3 to system:
			   $ echo 3 > /sys/devices/system/node/probe
			   $ cat /sys/devices/system/node/probe
				 2,4-5

Greg:   As you are trying to add a new sysfs file, please create the matching
		Documentation/ABI/ file as well.
Response: We miss it, and we already add it in v2.

Patch 4 & 5: 
Paul Mundt: This looks like an incredibly painful interface. How about scrapping all
of this _emu() mess and just reworking the register_cpu() interface?
Response: accept Paul's suggestion, and remove the cpu _emu functions.

Patch 7: 
Dave Hansen: If we're going to put multiple values into the file now and
		 add to the ABI, can we be more explicit about it?
		echo "physical_address=0x40000000 numa_node=3" > memory/probe
Response: Dave's new interface was accpeted, and more we still keep the old 
	      format for compatibility. We documented the these interfaces into
		  Documentation/ABI in v2.
Greg: 	suggest to use configfs replace for the memory probe interface
Andi: 	This is a debugging interface. It doesn't need to have the
	  	most pretty interface in the world, because it will be only used for
	  	QA by a few people. it's just a QA interface, not the next generation
		of POSIX.
Response: We still keep it as sysfs interface since node/cpu/memory probe interface
		  are all in sysfs, we can create another group of patches to support
		  configfs if we have this strong requirement in future.

v1:

the RFC version for NUMA Hotplug Emulator.

* WHAT IS HOTPLUG EMULATOR 

NUMA hotplug emulator is collectively named for the hotplug emulation
it is able to emulate NUMA Node Hotplug thru a pure software way. It
intends to help people easily debug and test node/cpu/memory hotplug
related stuff on a none-NUMA-hotplug-support machine, even an UMA machine.

The emulator provides mechanism to emulate the process of physcial cpu/mem
hotadd, it provides possibility to debug CPU and memory hotplug on the machines
without NUMA support for kenrel developers. It offers an interface for cpu
and memory hotplug test purpose.

* WHY DO WE USE HOTPLUG EMULATOR

We are focusing on the hotplug emualation for a few months. The emualor helps
 team to reproduce all the major hotplug bugs. It plays an important role to
the hotplug code quality assuirance. Because of the hotplug emulator, we already
move most of the debug working to virtual evironment.

* Principles & Usages 

NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug emulation.

1) Node hotplug emulation:

Adds a numa=possible=<N> command line option to set an additional N nodes as
being possible for memory hotplug. This set of possible nodes control
nr_node_ids and the sizes of several dynamically allocated node arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

For emulation on x86, it would be possible to set aside memory for hotplugged
nodes (say, anything above 2G) and to add an additional four nodes as being
possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/mem_hotplug/node1, in this example. ]

2) CPU hotplug emulation:

The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
hot-add/hot-remove in software method.

When hotplug a CPU with emulator, we are using a logical CPU to emulate the CPU
hotplug process. For the CPU supported SMT, some logical CPUs are in the same
socket, but it may located in different NUMA node after we have emulator.  We
put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only.

 - to hide CPUs
	- Using boot option "maxcpus=N" hide CPUs
	  N is the number of initialize CPUs
	- Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
      when cpu_hpe is enabled, the rest CPUs will not be initialized 

 - to hot-add CPU to node
	# echo nid > cpu/probe

 - to hot-remove CPU
	# echo nid > cpu/release

3) Memory hotplug emulation:

The emulator reserves memory before OS boots, the reserved memory region is
removed from e820 table. Each online node has an add_memory interface, and
memory can be hot-added via the per-ndoe add_memory debugfs interface. 

 - reserve memory thru a kernel boot paramter
 	mem=1024m

 - add a memory section to node 3
    # echo 0x40000000 > mem_hotplug/node3/add_memory

* ACKNOWLEDGMENT 

NUMA Hotplug Emulator includes a team's efforts, thanks all of them.
They are:
Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes and
Yongkang You


-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [0/7, v9] NUMA Hotplug Emulator (v9)
@ 2010-12-10  7:31 ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh

* PATCHSET INTRODUCTION

patch 1: Documentation.
patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
		 as being possible for memory hotplug. 
	    
patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface

patch 4: Abstract cpu register functions, make these interface friend for cpu
		 hotplug emulation
patch 5: Support cpu probe/release in x86, it provide a software method to hot
		 add/remove cpu with sysfs interface.
patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
		 domain to build the incorrect hierarchy.
patch 7: Implement per-node add_memory debugfs interface

* FEEDBACKDS & RESPONSES

v9:

Solve the bug reported by Eric B Munson, check the return value of cpu_down when do
 CPU release.

Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5 based on his
patch.

Some small changes on debugfs per-node add_memory interface.

v8:

Reconsider David's proposal, accept the per-node add_memory interface on debugfs.
(p7).

v7:

David:    We don't need two different interfaces, one in sysfs and one in debugfs,
          to hotplug memory.
Response: We use the debugfs for memory hotplug emulation only, for sysfs memory probe
          interface, we did not do any modifications, so we remove original patch 7
		  from patchset.
David:    Suggest new probe files in debugfs for each online node:
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/node0/add_memory
			/sys/kernel/debug/mem_hotplug/node1/add_memory

Response: We need not make a simple thing such complicated, We'd prefer to
          rename the mem_hotplug/probe interface as mem_hotplug/add_memory.
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)

v6:

Greg KH:  Suggest to use interface mem_hotplug/add_node
David:    Agree with Greg's suggestion
Response: We move the interface from node/add_node to mem_hotplug/add_node, and we also move 
          memory/probe interface to mem_hotplug/probe since both are related to memory hotplug.

Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to patch 1/8.
Response: Move patch 8/8 to patch 1/8, and we will include the full description in 0/8 when
          we send patches in future.	    
       

v5:

David: Suggests to use a flexible method to to do node hotplug emulation. After
       review our 2 versions emulator implemetations, David provides a better solution
	   to solve both the flexibility and memory wasting issue. 
	   
	   Add numa=possible=<N> command line option, provide sysfs inteface
	   /sys/devices/system/node/add_node interface, and move the inteface to debugfs
	   /sys/kernel/debug/hotplug/add_node after hearing the voice from community.

Greg KH: move the interface from hotplug/add_node to node/add_node

Response: Accept David's node=possible=<n> command line options. After talking
       with David, he agree to add his patch to our patchset, thanks David's solution(patch 1).

	   David's original interface /sys/kernel/debug/hotplug/add_node is not so clear for
	   node hotplug emulation, we accept Greg's suggestion, move the interface to ndoe/add_node  
	   (patch 2)
		 
Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us to use configfs replace
       sysfs. After Dave knows that it is just for test purpose, Dave thinks debugfs should
	   be the best.

Response: memory probe sysfs interface already exists, I'd like to still keep it, and extend it
       to support memory add on a specified node(patch 6).

	   We accepts Dave's suggestion, implement memory probe interface with debugfs(patch 7).

Randy Dunlap: Correct many grammatical errors in our documentation(patch 8).

Response: Thanks for Randy's careful review, we already correct them. 

v4: 

Split CPU hotplug emulation code since David has send a patchset for node hotplug emulation.

v3 & v2:

1) Patch 0
Balbir & Greg: Suggest to use tool git/quilt to manage/send the patchset.
Response: Thanks for the recommendation, With help from Fengguang, I get quilt
		  working, it is a great tool.

2) Patch 2
Jaswinder Singh: if (hidden_num) is not required in patch 2
Response: good catching, it is removed in v2.


3) Patch 3
Dave Hansen: Suggest to create a dedicated sysfs file for each possible node.
Greg: 	  How big would this "list" be?  What will it look like exactly?
Haicheng: It should follow "one value per file". It intends to show acceptable
		  parameters.

		  For example, if we have 4 fake offlined nodes, like node 2-5, then:
			   $ cat /sys/devices/system/node/probe
				 2-5

		  Then user hotadds node3 to system:
			   $ echo 3 > /sys/devices/system/node/probe
			   $ cat /sys/devices/system/node/probe
				 2,4-5

Greg:   As you are trying to add a new sysfs file, please create the matching
		Documentation/ABI/ file as well.
Response: We miss it, and we already add it in v2.

Patch 4 & 5: 
Paul Mundt: This looks like an incredibly painful interface. How about scrapping all
of this _emu() mess and just reworking the register_cpu() interface?
Response: accept Paul's suggestion, and remove the cpu _emu functions.

Patch 7: 
Dave Hansen: If we're going to put multiple values into the file now and
		 add to the ABI, can we be more explicit about it?
		echo "physical_address=0x40000000 numa_node=3" > memory/probe
Response: Dave's new interface was accpeted, and more we still keep the old 
	      format for compatibility. We documented the these interfaces into
		  Documentation/ABI in v2.
Greg: 	suggest to use configfs replace for the memory probe interface
Andi: 	This is a debugging interface. It doesn't need to have the
	  	most pretty interface in the world, because it will be only used for
	  	QA by a few people. it's just a QA interface, not the next generation
		of POSIX.
Response: We still keep it as sysfs interface since node/cpu/memory probe interface
		  are all in sysfs, we can create another group of patches to support
		  configfs if we have this strong requirement in future.

v1:

the RFC version for NUMA Hotplug Emulator.

* WHAT IS HOTPLUG EMULATOR 

NUMA hotplug emulator is collectively named for the hotplug emulation
it is able to emulate NUMA Node Hotplug thru a pure software way. It
intends to help people easily debug and test node/cpu/memory hotplug
related stuff on a none-NUMA-hotplug-support machine, even an UMA machine.

The emulator provides mechanism to emulate the process of physcial cpu/mem
hotadd, it provides possibility to debug CPU and memory hotplug on the machines
without NUMA support for kenrel developers. It offers an interface for cpu
and memory hotplug test purpose.

* WHY DO WE USE HOTPLUG EMULATOR

We are focusing on the hotplug emualation for a few months. The emualor helps
 team to reproduce all the major hotplug bugs. It plays an important role to
the hotplug code quality assuirance. Because of the hotplug emulator, we already
move most of the debug working to virtual evironment.

* Principles & Usages 

NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug emulation.

1) Node hotplug emulation:

Adds a numa=possible=<N> command line option to set an additional N nodes as
being possible for memory hotplug. This set of possible nodes control
nr_node_ids and the sizes of several dynamically allocated node arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

For emulation on x86, it would be possible to set aside memory for hotplugged
nodes (say, anything above 2G) and to add an additional four nodes as being
possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/mem_hotplug/node1, in this example. ]

2) CPU hotplug emulation:

The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
hot-add/hot-remove in software method.

When hotplug a CPU with emulator, we are using a logical CPU to emulate the CPU
hotplug process. For the CPU supported SMT, some logical CPUs are in the same
socket, but it may located in different NUMA node after we have emulator.  We
put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only.

 - to hide CPUs
	- Using boot option "maxcpus=N" hide CPUs
	  N is the number of initialize CPUs
	- Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
      when cpu_hpe is enabled, the rest CPUs will not be initialized 

 - to hot-add CPU to node
	# echo nid > cpu/probe

 - to hot-remove CPU
	# echo nid > cpu/release

3) Memory hotplug emulation:

The emulator reserves memory before OS boots, the reserved memory region is
removed from e820 table. Each online node has an add_memory interface, and
memory can be hot-added via the per-ndoe add_memory debugfs interface. 

 - reserve memory thru a kernel boot paramter
 	mem=1024m

 - add a memory section to node 3
    # echo 0x40000000 > mem_hotplug/node3/add_memory

* ACKNOWLEDGMENT 

NUMA Hotplug Emulator includes a team's efforts, thanks all of them.
They are:
Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes and
Yongkang You


-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [1/7, v9] NUMA Hotplug Emulator: Documentation
  2010-12-10  7:31 ` shaohui.zheng
@ 2010-12-10  7:31   ` shaohui.zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 001-hotplug-emulator-doc-x86_64-of-numa-hotplug-emulator.patch --]
[-- Type: text/plain, Size: 4113 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
to explain the usage for the hotplug emulator.

Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt	2010-12-10 13:43:19.573331001 +0800
@@ -0,0 +1,97 @@
+NUMA Hotplug Emulator for x86_64
+---------------------------------------------------
+
+NUMA hotplug emulator is able to emulate NUMA Node Hotplug
+thru a pure software way. It intends to help people easily debug
+and test node/CPU/memory hotplug related stuff on a
+none-NUMA-hotplug-support machine, even a UMA machine and virtual
+environment.
+
+1) Node hotplug emulation:
+
+Adds a numa=possible=<N> command line option to set an additional N nodes
+as being possible for memory hotplug.  This set of possible nodes
+control nr_node_ids and the sizes of several dynamically allocated node
+arrays.
+
+This allows memory hotplug to create new nodes for newly added memory
+rather than binding it to existing nodes.
+
+For emulation on x86, it would be possible to set aside memory for hotplugged
+nodes (say, anything above 2G) and to add an additional four nodes as being
+possible on boot with
+
+	mem=2G numa=possible=4
+
+and then creating a new 128M node at runtime:
+
+	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
+	On node 1 totalpages: 0
+	init_memory_mapping: 0000000080000000-0000000088000000
+	 0080000000 - 0088000000 page 2M
+
+Once the new node has been added, its memory can be onlined.  If this
+memory represents memory section 16, for example:
+
+	# echo online > /sys/devices/system/memory/memory16/state
+	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
+	Policy zone: Normal
+ [ The memory section(s) mapped to a particular node are visible via
+   /sys/devices/system/mem_hotplug/node1, in this example. ]
+
+2) CPU hotplug emulation:
+
+The emulator reserves CPUs thru grub parameter, the reserved CPUs can be
+hot-add/hot-remove in software method, it emulates the process of physical
+cpu hotplug.
+
+When hotplugging a CPU with emulator, we are using a logical CPU to emulate the
+CPU socket hotplug process. For the CPU supported SMT, some logical CPUs are in
+the same socket, but it may located in different NUMA node after we have
+emulator. We put the logical CPU into a fake CPU socket, and assign it a
+unique phys_proc_id. For the fake socket, we put one logical CPU in only.
+
+ - to hide CPUs
+	- Using boot option "maxcpus=N" hide CPUs
+	  N is the number of CPUs to initialize; the reset will be hidden.
+	- Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
+      when cpu_hpe is enabled, the rest CPUs will not be initialized
+
+ - to hot-add CPU to node
+	# echo nid > cpu/probe
+
+ - to hot-remove CPU
+	# echo nid > cpu/release
+
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+ - reserve memory thru a kernel boot paramter
+ 	mem=1024m
+
+ - add a memory section to node 3
+    # echo 0x40000000 > mem_hotplug/node3/add_memory
+
+4) Script for hotplug testing
+
+These scripts provides convenience when we hot-add memory/cpu in batch.
+
+- Online all memory sections:
+for m in /sys/devices/system/memory/memory*;
+do
+	echo online > $m/state;
+done
+
+- CPU Online:
+for c in /sys/devices/system/cpu/cpu*;
+do
+	echo 1 > $c/online;
+done
+
+- David Rientjes <rientjes@google.com>
+- Haicheng Li <haicheng.li@intel.com>
+- Shaohui Zheng <shaohui.zheng@intel.com>
+  Nov 2010

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [1/7, v9] NUMA Hotplug Emulator: Documentation
@ 2010-12-10  7:31   ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 001-hotplug-emulator-doc-x86_64-of-numa-hotplug-emulator.patch --]
[-- Type: text/plain, Size: 4409 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
to explain the usage for the hotplug emulator.

Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt	2010-12-10 13:43:19.573331001 +0800
@@ -0,0 +1,97 @@
+NUMA Hotplug Emulator for x86_64
+---------------------------------------------------
+
+NUMA hotplug emulator is able to emulate NUMA Node Hotplug
+thru a pure software way. It intends to help people easily debug
+and test node/CPU/memory hotplug related stuff on a
+none-NUMA-hotplug-support machine, even a UMA machine and virtual
+environment.
+
+1) Node hotplug emulation:
+
+Adds a numa=possible=<N> command line option to set an additional N nodes
+as being possible for memory hotplug.  This set of possible nodes
+control nr_node_ids and the sizes of several dynamically allocated node
+arrays.
+
+This allows memory hotplug to create new nodes for newly added memory
+rather than binding it to existing nodes.
+
+For emulation on x86, it would be possible to set aside memory for hotplugged
+nodes (say, anything above 2G) and to add an additional four nodes as being
+possible on boot with
+
+	mem=2G numa=possible=4
+
+and then creating a new 128M node at runtime:
+
+	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
+	On node 1 totalpages: 0
+	init_memory_mapping: 0000000080000000-0000000088000000
+	 0080000000 - 0088000000 page 2M
+
+Once the new node has been added, its memory can be onlined.  If this
+memory represents memory section 16, for example:
+
+	# echo online > /sys/devices/system/memory/memory16/state
+	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
+	Policy zone: Normal
+ [ The memory section(s) mapped to a particular node are visible via
+   /sys/devices/system/mem_hotplug/node1, in this example. ]
+
+2) CPU hotplug emulation:
+
+The emulator reserves CPUs thru grub parameter, the reserved CPUs can be
+hot-add/hot-remove in software method, it emulates the process of physical
+cpu hotplug.
+
+When hotplugging a CPU with emulator, we are using a logical CPU to emulate the
+CPU socket hotplug process. For the CPU supported SMT, some logical CPUs are in
+the same socket, but it may located in different NUMA node after we have
+emulator. We put the logical CPU into a fake CPU socket, and assign it a
+unique phys_proc_id. For the fake socket, we put one logical CPU in only.
+
+ - to hide CPUs
+	- Using boot option "maxcpus=N" hide CPUs
+	  N is the number of CPUs to initialize; the reset will be hidden.
+	- Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
+      when cpu_hpe is enabled, the rest CPUs will not be initialized
+
+ - to hot-add CPU to node
+	# echo nid > cpu/probe
+
+ - to hot-remove CPU
+	# echo nid > cpu/release
+
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+ - reserve memory thru a kernel boot paramter
+ 	mem=1024m
+
+ - add a memory section to node 3
+    # echo 0x40000000 > mem_hotplug/node3/add_memory
+
+4) Script for hotplug testing
+
+These scripts provides convenience when we hot-add memory/cpu in batch.
+
+- Online all memory sections:
+for m in /sys/devices/system/memory/memory*;
+do
+	echo online > $m/state;
+done
+
+- CPU Online:
+for c in /sys/devices/system/cpu/cpu*;
+do
+	echo 1 > $c/online;
+done
+
+- David Rientjes <rientjes@google.com>
+- Haicheng Li <haicheng.li@intel.com>
+- Shaohui Zheng <shaohui.zheng@intel.com>
+  Nov 2010

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [2/7, v9] NUMA Hotplug Emulator: Add numa=possible option
  2010-12-10  7:31 ` shaohui.zheng
@ 2010-12-10  7:31   ` shaohui.zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 002-add-node-possible-option.patch --]
[-- Type: text/plain, Size: 3341 bytes --]

From:  David Rientjes <rientjes@google.com>

Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug.  This set of possible nodes
controls nr_node_ids and the sizes of several dynamically allocated node
arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

The first use-case for this will be node hotplug emulation which will use
these possible nodes to create new nodes to test the memory hotplug
callbacks and surrounding memory hotplug code.

CC: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/x86/x86_64/boot-options.txt |    4 ++++
 arch/x86/mm/numa_64.c                     |   18 +++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -174,6 +174,10 @@ NUMA
 		If given as an integer, fills all system RAM with N fake nodes
 		interleaved over physical nodes.
 
+  numa=possible=<N>
+		Sets an additional N nodes as being possible for memory
+		hotplug.
+
 ACPI
 
   acpi=off	Don't enable ACPI
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -33,6 +33,7 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
+static unsigned long __initdata numa_possible_nodes;
 
 /*
  * Map cpu index to node index
@@ -611,7 +612,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 
 #ifdef CONFIG_NUMA_EMU
 	if (cmdline && !numa_emulation(start_pfn, last_pfn, acpi, k8))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -619,14 +620,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 #ifdef CONFIG_ACPI_NUMA
 	if (!numa_off && acpi && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
 						  last_pfn << PAGE_SHIFT))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
 
 #ifdef CONFIG_K8_NUMA
 	if (!numa_off && k8 && !k8_scan_nodes())
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 		numa_set_node(i, 0);
 	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+out: __maybe_unused
+	for (i = 0; i < numa_possible_nodes; i++) {
+		int nid;
+
+		nid = first_unset_node(node_possible_map);
+		if (nid == MAX_NUMNODES)
+			break;
+		node_set(nid, node_possible_map);
+	}
 }
 
 unsigned long __init numa_free_all_bootmem(void)
@@ -675,6 +685,8 @@ static __init int numa_setup(char *opt)
 	if (!strncmp(opt, "noacpi", 6))
 		acpi_numa = -1;
 #endif
+	if (!strncmp(opt, "possible=", 9))
+		numa_possible_nodes = simple_strtoul(opt + 9, NULL, 0);
 	return 0;
 }
 early_param("numa", numa_setup);

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [2/7, v9] NUMA Hotplug Emulator: Add numa=possible option
@ 2010-12-10  7:31   ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 002-add-node-possible-option.patch --]
[-- Type: text/plain, Size: 3637 bytes --]

From:  David Rientjes <rientjes@google.com>

Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug.  This set of possible nodes
controls nr_node_ids and the sizes of several dynamically allocated node
arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

The first use-case for this will be node hotplug emulation which will use
these possible nodes to create new nodes to test the memory hotplug
callbacks and surrounding memory hotplug code.

CC: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/x86/x86_64/boot-options.txt |    4 ++++
 arch/x86/mm/numa_64.c                     |   18 +++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -174,6 +174,10 @@ NUMA
 		If given as an integer, fills all system RAM with N fake nodes
 		interleaved over physical nodes.
 
+  numa=possible=<N>
+		Sets an additional N nodes as being possible for memory
+		hotplug.
+
 ACPI
 
   acpi=off	Don't enable ACPI
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -33,6 +33,7 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
+static unsigned long __initdata numa_possible_nodes;
 
 /*
  * Map cpu index to node index
@@ -611,7 +612,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 
 #ifdef CONFIG_NUMA_EMU
 	if (cmdline && !numa_emulation(start_pfn, last_pfn, acpi, k8))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -619,14 +620,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 #ifdef CONFIG_ACPI_NUMA
 	if (!numa_off && acpi && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
 						  last_pfn << PAGE_SHIFT))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
 
 #ifdef CONFIG_K8_NUMA
 	if (!numa_off && k8 && !k8_scan_nodes())
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 		numa_set_node(i, 0);
 	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+out: __maybe_unused
+	for (i = 0; i < numa_possible_nodes; i++) {
+		int nid;
+
+		nid = first_unset_node(node_possible_map);
+		if (nid == MAX_NUMNODES)
+			break;
+		node_set(nid, node_possible_map);
+	}
 }
 
 unsigned long __init numa_free_all_bootmem(void)
@@ -675,6 +685,8 @@ static __init int numa_setup(char *opt)
 	if (!strncmp(opt, "noacpi", 6))
 		acpi_numa = -1;
 #endif
+	if (!strncmp(opt, "possible=", 9))
+		numa_possible_nodes = simple_strtoul(opt + 9, NULL, 0);
 	return 0;
 }
 early_param("numa", numa_setup);

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
  2010-12-10  7:31 ` shaohui.zheng
@ 2010-12-10  7:31   ` shaohui.zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 003-node-hotpluge-emulation.patch --]
[-- Type: text/plain, Size: 5488 bytes --]

From: David Rientjes <rientjes@google.com>

Add an interface to allow new nodes to be added when performing memory
hot-add.  This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.

This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
that behaves in a similar way to the memory hot-add "probe" interface.
Its format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.

The new node id is a currently offline, but possible, node.  The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.

For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional four nodes as being possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M
Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/kernel/debug/mem_hotplug/node1, in this example. ]

The new node is now hotplugged and ready for testing.

CC: Haicheng Li <haicheng.li@intel.com>
CC: Greg KH <gregkh@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/memory-hotplug.txt |   24 +++++++++++++++
 mm/memory_hotplug.c              |   59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 0 deletions(-)
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-11-30 12:40:43.527622001 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-11-30 14:11:11.827622000 +0800
@@ -18,6 +18,7 @@
 4. Physical memory hot-add phase
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
+  4.3 Node hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -215,6 +216,29 @@
 Please see "How to online memory" in this text.
 
 
+4.3 Node hotplug emulation
+------------
+With debugfs, it is possible to test node hotplug by assigning the newly
+added memory to a new node id when using a different interface with a similar
+behavior to "probe" described in section 4.2.  If a node id is possible
+(there are bits in /sys/devices/system/memory/possible that are not online),
+then it may be used to emulate a newly added node as the result of memory
+hotplug by using the debugfs "add_node" interface.
+
+The add_node interface is located at "mem_hotplug/add_node" at the debugfs
+mount point.
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G").  The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
 
 ------------------------------
 5. Logical Memory hot-add phase
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
@@ -924,3 +924,63 @@
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *memhp_debug_root;
+
+static ssize_t add_node_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	nodemask_t mask;
+	u64 start, size;
+	char buffer[64];
+	char *p;
+	int nid;
+	int ret;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	size = memparse(buffer, &p);
+	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
+		return -EINVAL;
+	if (*p != '@')
+		return -EINVAL;
+
+	start = simple_strtoull(p + 1, NULL, 0);
+
+	nodes_andnot(mask, node_possible_map, node_online_map);
+	nid = first_node(mask);
+	if (nid == MAX_NUMNODES)
+		return -ENOMEM;
+
+	ret = add_memory(nid, start, size);
+	return ret ? ret : count;
+}
+
+static const struct file_operations add_node_file_ops = {
+	.write		= add_node_store,
+	.llseek		= generic_file_llseek,
+};
+
+static int __init node_debug_init(void)
+{
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
+			NULL, &add_node_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+module_init(node_debug_init);
+#endif /* CONFIG_DEBUG_FS */

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
@ 2010-12-10  7:31   ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 003-node-hotpluge-emulation.patch --]
[-- Type: text/plain, Size: 5784 bytes --]

From: David Rientjes <rientjes@google.com>

Add an interface to allow new nodes to be added when performing memory
hot-add.  This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.

This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
that behaves in a similar way to the memory hot-add "probe" interface.
Its format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.

The new node id is a currently offline, but possible, node.  The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.

For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional four nodes as being possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M
Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/kernel/debug/mem_hotplug/node1, in this example. ]

The new node is now hotplugged and ready for testing.

CC: Haicheng Li <haicheng.li@intel.com>
CC: Greg KH <gregkh@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/memory-hotplug.txt |   24 +++++++++++++++
 mm/memory_hotplug.c              |   59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 0 deletions(-)
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-11-30 12:40:43.527622001 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-11-30 14:11:11.827622000 +0800
@@ -18,6 +18,7 @@
 4. Physical memory hot-add phase
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
+  4.3 Node hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -215,6 +216,29 @@
 Please see "How to online memory" in this text.
 
 
+4.3 Node hotplug emulation
+------------
+With debugfs, it is possible to test node hotplug by assigning the newly
+added memory to a new node id when using a different interface with a similar
+behavior to "probe" described in section 4.2.  If a node id is possible
+(there are bits in /sys/devices/system/memory/possible that are not online),
+then it may be used to emulate a newly added node as the result of memory
+hotplug by using the debugfs "add_node" interface.
+
+The add_node interface is located at "mem_hotplug/add_node" at the debugfs
+mount point.
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G").  The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
 
 ------------------------------
 5. Logical Memory hot-add phase
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
@@ -924,3 +924,63 @@
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *memhp_debug_root;
+
+static ssize_t add_node_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	nodemask_t mask;
+	u64 start, size;
+	char buffer[64];
+	char *p;
+	int nid;
+	int ret;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	size = memparse(buffer, &p);
+	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
+		return -EINVAL;
+	if (*p != '@')
+		return -EINVAL;
+
+	start = simple_strtoull(p + 1, NULL, 0);
+
+	nodes_andnot(mask, node_possible_map, node_online_map);
+	nid = first_node(mask);
+	if (nid == MAX_NUMNODES)
+		return -ENOMEM;
+
+	ret = add_memory(nid, start, size);
+	return ret ? ret : count;
+}
+
+static const struct file_operations add_node_file_ops = {
+	.write		= add_node_store,
+	.llseek		= generic_file_llseek,
+};
+
+static int __init node_debug_init(void)
+{
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
+			NULL, &add_node_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+module_init(node_debug_init);
+#endif /* CONFIG_DEBUG_FS */

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [4/7, v9] NUMA Hotplug Emulator: Abstract cpu register functions
  2010-12-10  7:31 ` shaohui.zheng
@ 2010-12-10  7:31   ` shaohui.zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Shaohui Zheng

[-- Attachment #1: 004-hotplug-emulator-x86-abstract-cpu-register-functions.patch --]
[-- Type: text/plain, Size: 3359 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

Abstract cpu register functions, provide a more flexible interface
register_cpu_node, the new interface provides convenience to add cpu
to a specified node, we can use it to add a cpu to a fake node.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/cpu.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/cpu.h	2010-11-17 09:00:59.742608402 +0800
+++ linux-hpe4/arch/x86/include/asm/cpu.h	2010-11-17 09:01:10.192838977 +0800
@@ -27,6 +27,7 @@
 
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
+extern int arch_register_cpu_node(int num, int nid);
 extern void arch_unregister_cpu(int);
 #endif
 
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 10:05:32.934085248 +0800
@@ -52,6 +52,15 @@
 }
 EXPORT_SYMBOL(arch_register_cpu);
 
+int __ref arch_register_cpu_node(int num, int nid)
+{
+	if (num)
+		per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
+
+	return register_cpu_node(&per_cpu(cpu_devices, num).cpu, num, nid);
+}
+EXPORT_SYMBOL(arch_register_cpu_node);
+
 void arch_unregister_cpu(int num)
 {
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-11-17 10:05:32.943465010 +0800
@@ -208,17 +208,18 @@
 static SYSDEV_CLASS_ATTR(offline, 0444, print_cpus_offline, NULL);
 
 /*
- * register_cpu - Setup a sysfs device for a CPU.
+ * register_cpu_node - Setup a sysfs device for a CPU.
  * @cpu - cpu->hotpluggable field set to 1 will generate a control file in
  *	  sysfs for this CPU.
  * @num - CPU number to use when creating the device.
+ * @nid - Node ID to use, if any.
  *
  * Initialize and register the CPU device.
  */
-int __cpuinit register_cpu(struct cpu *cpu, int num)
+int __cpuinit register_cpu_node(struct cpu *cpu, int num, int nid)
 {
 	int error;
-	cpu->node_id = cpu_to_node(num);
+	cpu->node_id = nid;
 	cpu->sysdev.id = num;
 	cpu->sysdev.cls = &cpu_sysdev_class;
 
@@ -229,7 +230,7 @@
 	if (!error)
 		per_cpu(cpu_sys_devices, num) = &cpu->sysdev;
 	if (!error)
-		register_cpu_under_node(num, cpu_to_node(num));
+		register_cpu_under_node(num, nid);
 
 #ifdef CONFIG_KEXEC
 	if (!error)
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-11-17 09:00:59.772898926 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-11-17 10:05:32.954085309 +0800
@@ -30,7 +30,13 @@
 	struct sys_device sysdev;
 };
 
-extern int register_cpu(struct cpu *cpu, int num);
+extern int register_cpu_node(struct cpu *cpu, int num, int nid);
+
+static inline int register_cpu(struct cpu *cpu, int num)
+{
+	return register_cpu_node(cpu, num, cpu_to_node(num));
+}
+
 extern struct sys_device *get_cpu_sysdev(unsigned cpu);
 
 extern int cpu_add_sysdev_attr(struct sysdev_attribute *attr);

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [4/7, v9] NUMA Hotplug Emulator: Abstract cpu register functions
@ 2010-12-10  7:31   ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Shaohui Zheng

[-- Attachment #1: 004-hotplug-emulator-x86-abstract-cpu-register-functions.patch --]
[-- Type: text/plain, Size: 3655 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

Abstract cpu register functions, provide a more flexible interface
register_cpu_node, the new interface provides convenience to add cpu
to a specified node, we can use it to add a cpu to a fake node.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/cpu.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/cpu.h	2010-11-17 09:00:59.742608402 +0800
+++ linux-hpe4/arch/x86/include/asm/cpu.h	2010-11-17 09:01:10.192838977 +0800
@@ -27,6 +27,7 @@
 
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
+extern int arch_register_cpu_node(int num, int nid);
 extern void arch_unregister_cpu(int);
 #endif
 
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 10:05:32.934085248 +0800
@@ -52,6 +52,15 @@
 }
 EXPORT_SYMBOL(arch_register_cpu);
 
+int __ref arch_register_cpu_node(int num, int nid)
+{
+	if (num)
+		per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
+
+	return register_cpu_node(&per_cpu(cpu_devices, num).cpu, num, nid);
+}
+EXPORT_SYMBOL(arch_register_cpu_node);
+
 void arch_unregister_cpu(int num)
 {
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-11-17 10:05:32.943465010 +0800
@@ -208,17 +208,18 @@
 static SYSDEV_CLASS_ATTR(offline, 0444, print_cpus_offline, NULL);
 
 /*
- * register_cpu - Setup a sysfs device for a CPU.
+ * register_cpu_node - Setup a sysfs device for a CPU.
  * @cpu - cpu->hotpluggable field set to 1 will generate a control file in
  *	  sysfs for this CPU.
  * @num - CPU number to use when creating the device.
+ * @nid - Node ID to use, if any.
  *
  * Initialize and register the CPU device.
  */
-int __cpuinit register_cpu(struct cpu *cpu, int num)
+int __cpuinit register_cpu_node(struct cpu *cpu, int num, int nid)
 {
 	int error;
-	cpu->node_id = cpu_to_node(num);
+	cpu->node_id = nid;
 	cpu->sysdev.id = num;
 	cpu->sysdev.cls = &cpu_sysdev_class;
 
@@ -229,7 +230,7 @@
 	if (!error)
 		per_cpu(cpu_sys_devices, num) = &cpu->sysdev;
 	if (!error)
-		register_cpu_under_node(num, cpu_to_node(num));
+		register_cpu_under_node(num, nid);
 
 #ifdef CONFIG_KEXEC
 	if (!error)
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-11-17 09:00:59.772898926 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-11-17 10:05:32.954085309 +0800
@@ -30,7 +30,13 @@
 	struct sys_device sysdev;
 };
 
-extern int register_cpu(struct cpu *cpu, int num);
+extern int register_cpu_node(struct cpu *cpu, int num, int nid);
+
+static inline int register_cpu(struct cpu *cpu, int num)
+{
+	return register_cpu_node(cpu, num, cpu_to_node(num));
+}
+
 extern struct sys_device *get_cpu_sysdev(unsigned cpu);
 
 extern int cpu_add_sysdev_attr(struct sysdev_attribute *attr);

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-10  7:31 ` shaohui.zheng
@ 2010-12-10  7:31   ` shaohui.zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu, Tejun Heo,
	Shaohui Zheng, Haicheng Li

[-- Attachment #1: 005-hotplug-emulator-x86-support-cpu-probe-release-in-x86.patch --]
[-- Type: text/plain, Size: 10852 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

CPU physical hot-add/hot-remove are supported on some hardwares, and it 
was already supported in current linux kernel. NUMA Hotplug Emulator provides
a mechanism to emulate the process with software method. It can be used for
testing or debuging purpose.

CPU physical hotplug is different with logical CPU online/offline. Logical
online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
hotplug emulator uses probe/release interface. It becomes possible to do cpu
hotplug automation and stress

Add cpu interface probe/release under sysfs for x86_64. User can use this
interface to emulate the cpu hot-add and hot-remove process.

Directive:
*) Reserve CPU thru grub parameter like:
	maxcpus=4

the rest CPUs will not be initiliazed. 

*) Probe CPU
we can use the probe interface to hot-add new CPUs:
	echo nid > /sys/devices/system/cpu/probe

*) Release a CPU
	echo cpu > /sys/devices/system/cpu/release

A reserved CPU will be hot-added to the specified node.
1) nid == 0, the CPU will be added to the real node which the CPU
should be in
2) nid != 0, add the CPU to node nid even through it is a fake node.

CC: Ingo Molnar <mingo@elte.hu>
CC: Len Brown <len.brown@intel.com>
CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
CC: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
---
This patch is based on Tejun's unification of the 32 and 64 bit NUMA boot paths,
 specifically the patch at http://marc.info/?l=linux-kernel&m=129087151912379.
Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c	2010-12-10 13:42:34.553331000 +0800
+++ linux-hpe4/arch/x86/kernel/acpi/boot.c	2010-12-10 14:48:32.113331001 +0800
@@ -668,8 +668,39 @@
 }
 EXPORT_SYMBOL(acpi_map_lsapic);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
+{
+#ifdef CONFIG_ACPI_NUMA
+	set_apicid_to_node(physid, nid);
+	numa_set_node(cpu, nid);
+#endif
+}
+
+static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
+int __ref acpi_map_lsapic_emu(int pcpu, int nid)
+{
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[pcpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
+		cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
+
+	per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
+	acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
+
+	return pcpu;
+}
+EXPORT_SYMBOL(acpi_map_lsapic_emu);
+#endif
+
 int acpi_unmap_lsapic(int cpu)
 {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[cpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
+		cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
+#endif
 	per_cpu(x86_cpu_to_apicid, cpu) = -1;
 	set_cpu_present(cpu, false);
 	num_processors--;
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-12-10 13:42:34.563331000 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-12-10 14:48:32.113331001 +0800
@@ -103,8 +103,6 @@
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
 
-ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
-ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-12-10 14:39:43.333331000 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-12-10 14:49:56.043331000 +0800
@@ -30,6 +30,9 @@
 #include <linux/init.h>
 #include <linux/smp.h>
 #include <asm/cpu.h>
+#include <linux/cpu.h>
+#include <linux/topology.h>
+#include <linux/acpi.h>
 
 static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
 
@@ -66,6 +69,78 @@
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
+
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+	int nid = 0;
+	int num = 0, selected = 0;
+
+	/* check parameters */
+	if (!buf || count < 2)
+		return -EPERM;
+
+	nid = simple_strtoul(buf, NULL, 0);
+	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
+
+	if (nid < 0 || nid > nr_node_ids - 1) {
+		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
+			nid, nr_node_ids);
+		return -EPERM;
+	}
+
+	if (!node_online(nid)) {
+		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
+		return -EPERM;
+	}
+
+	/* find first uninitialized cpu */
+	for_each_present_cpu(num) {
+		if (per_cpu(cpu_sys_devices, num) == NULL) {
+			selected = num;
+			break;
+		}
+	}
+
+	if (selected >= num_possible_cpus()) {
+		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
+		return -EPERM;
+	}
+
+	/* register cpu */
+	arch_register_cpu_node(selected, nid);
+	acpi_map_lsapic_emu(selected, nid);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_probe);
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+	int cpu = 0;
+
+	cpu =  simple_strtoul(buf, NULL, 0);
+	/* cpu 0 is not hotplugable */
+	if (cpu == 0) {
+		printk(KERN_ERR "can not release cpu 0.\n");
+		return -EPERM;
+	}
+
+	if (cpu_online(cpu)) {
+		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
+		if (!cpu_down(cpu)) {
+			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
+			return -EPERM;
+		}
+
+	}
+
+	arch_unregister_cpu(cpu);
+	acpi_unmap_lsapic(cpu);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_release);
+
 #else /* CONFIG_HOTPLUG_CPU */
 
 static int __init arch_register_cpu(int num)
@@ -83,8 +158,14 @@
 		register_one_node(i);
 #endif
 
-	for_each_present_cpu(i)
-		arch_register_cpu(i);
+	/*
+	 * when cpu hotplug emulation enabled, register the online cpu only,
+	 * the rests are reserved for cpu probe.
+	 */
+	for_each_present_cpu(i) {
+		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
+			arch_register_cpu(i);
+	}
 
 	return 0;
 }
Index: linux-hpe4/arch/x86/mm/numa_64.c
===================================================================
--- linux-hpe4.orig/arch/x86/mm/numa_64.c	2010-12-10 14:39:37.153331000 +0800
+++ linux-hpe4/arch/x86/mm/numa_64.c	2010-12-10 14:48:32.123331001 +0800
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -667,3 +668,17 @@
 		return __apicid_to_node[apicid];
 	return NUMA_NO_NODE;
 }
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static __init int cpu_hpe_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+
+	if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
+		cpu_hpe_on = 1;
+
+	return 0;
+}
+early_param("cpu_hpe", cpu_hpe_setup);
+#endif  /* CONFIG_ARCH_CPU_PROBE_RELEASE */
Index: linux-hpe4/drivers/acpi/processor_driver.c
===================================================================
--- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-12-10 13:42:34.593331000 +0800
+++ linux-hpe4/drivers/acpi/processor_driver.c	2010-12-10 14:48:32.143331001 +0800
@@ -542,6 +542,14 @@
 		goto err_free_cpumask;
 
 	sysdev = get_cpu_sysdev(pr->id);
+	/*
+	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
+	 * throu the cpu probe interface. Return directly.
+	 */
+	if (sysdev == NULL) {
+		goto out;
+	}
+
 	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
 		result = -EFAULT;
 		goto err_remove_fs;
@@ -582,6 +590,7 @@
 		goto err_remove_sysfs;
 	}
 
+out:
 	return 0;
 
 err_remove_sysfs:
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-12-10 14:39:43.333331000 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-12-10 14:48:32.143331001 +0800
@@ -22,9 +22,15 @@
 };
 EXPORT_SYMBOL(cpu_sysdev_class);
 
-static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
+DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
 #ifdef CONFIG_HOTPLUG_CPU
+/*
+ * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
+ * disabled in default, we can enable it throu grub parameter cpu_hpe=on
+ */
+int cpu_hpe_on;
+
 static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
 			   char *buf)
 {
Index: linux-hpe4/include/linux/acpi.h
===================================================================
--- linux-hpe4.orig/include/linux/acpi.h	2010-12-10 13:42:34.613331000 +0800
+++ linux-hpe4/include/linux/acpi.h	2010-12-10 14:48:32.153331001 +0800
@@ -102,6 +102,7 @@
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_lsapic(acpi_handle handle, int *pcpu);
+int acpi_map_lsapic_emu(int pcpu, int nid);
 int acpi_unmap_lsapic(int cpu);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-12-10 14:39:43.333331000 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-12-10 14:48:32.153331001 +0800
@@ -25,6 +25,8 @@
 	struct sys_device sysdev;
 };
 
+DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
+
 extern int register_cpu_node(struct cpu *cpu, int num, int nid);
 
 static inline int register_cpu(struct cpu *cpu, int num)
@@ -144,6 +146,7 @@
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
 int cpu_down(unsigned int cpu);
+extern int cpu_hpe_on;
 
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 extern void cpu_hotplug_driver_lock(void);
@@ -166,6 +169,7 @@
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
 #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
+static int cpu_hpe_on;
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_PM_SLEEP_SMP
Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
===================================================================
--- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt	2010-12-10 14:39:37.153331000 +0800
+++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt	2010-12-10 14:48:32.153331001 +0800
@@ -320,3 +320,8 @@
 		Do not use GB pages for kernel direct mappings.
 	gbpages
 		Use GB pages for kernel direct mappings.
+	cpu_hpe=on/off
+		Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
+		sysfs provides probe/release interface to hot add/remove CPUs dynamically.
+		We can use maxcpus=<N> to reserve CPUs.
+		This option is disabled by default.

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-10  7:31   ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu, Tejun Heo,
	Shaohui Zheng, Haicheng Li

[-- Attachment #1: 005-hotplug-emulator-x86-support-cpu-probe-release-in-x86.patch --]
[-- Type: text/plain, Size: 11148 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

CPU physical hot-add/hot-remove are supported on some hardwares, and it 
was already supported in current linux kernel. NUMA Hotplug Emulator provides
a mechanism to emulate the process with software method. It can be used for
testing or debuging purpose.

CPU physical hotplug is different with logical CPU online/offline. Logical
online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
hotplug emulator uses probe/release interface. It becomes possible to do cpu
hotplug automation and stress

Add cpu interface probe/release under sysfs for x86_64. User can use this
interface to emulate the cpu hot-add and hot-remove process.

Directive:
*) Reserve CPU thru grub parameter like:
	maxcpus=4

the rest CPUs will not be initiliazed. 

*) Probe CPU
we can use the probe interface to hot-add new CPUs:
	echo nid > /sys/devices/system/cpu/probe

*) Release a CPU
	echo cpu > /sys/devices/system/cpu/release

A reserved CPU will be hot-added to the specified node.
1) nid == 0, the CPU will be added to the real node which the CPU
should be in
2) nid != 0, add the CPU to node nid even through it is a fake node.

CC: Ingo Molnar <mingo@elte.hu>
CC: Len Brown <len.brown@intel.com>
CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
CC: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
---
This patch is based on Tejun's unification of the 32 and 64 bit NUMA boot paths,
 specifically the patch at http://marc.info/?l=linux-kernel&m=129087151912379.
Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c	2010-12-10 13:42:34.553331000 +0800
+++ linux-hpe4/arch/x86/kernel/acpi/boot.c	2010-12-10 14:48:32.113331001 +0800
@@ -668,8 +668,39 @@
 }
 EXPORT_SYMBOL(acpi_map_lsapic);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
+{
+#ifdef CONFIG_ACPI_NUMA
+	set_apicid_to_node(physid, nid);
+	numa_set_node(cpu, nid);
+#endif
+}
+
+static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
+int __ref acpi_map_lsapic_emu(int pcpu, int nid)
+{
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[pcpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
+		cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
+
+	per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
+	acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
+
+	return pcpu;
+}
+EXPORT_SYMBOL(acpi_map_lsapic_emu);
+#endif
+
 int acpi_unmap_lsapic(int cpu)
 {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[cpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
+		cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
+#endif
 	per_cpu(x86_cpu_to_apicid, cpu) = -1;
 	set_cpu_present(cpu, false);
 	num_processors--;
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-12-10 13:42:34.563331000 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-12-10 14:48:32.113331001 +0800
@@ -103,8 +103,6 @@
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
 
-ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
-ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-12-10 14:39:43.333331000 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-12-10 14:49:56.043331000 +0800
@@ -30,6 +30,9 @@
 #include <linux/init.h>
 #include <linux/smp.h>
 #include <asm/cpu.h>
+#include <linux/cpu.h>
+#include <linux/topology.h>
+#include <linux/acpi.h>
 
 static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
 
@@ -66,6 +69,78 @@
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
+
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+	int nid = 0;
+	int num = 0, selected = 0;
+
+	/* check parameters */
+	if (!buf || count < 2)
+		return -EPERM;
+
+	nid = simple_strtoul(buf, NULL, 0);
+	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
+
+	if (nid < 0 || nid > nr_node_ids - 1) {
+		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
+			nid, nr_node_ids);
+		return -EPERM;
+	}
+
+	if (!node_online(nid)) {
+		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
+		return -EPERM;
+	}
+
+	/* find first uninitialized cpu */
+	for_each_present_cpu(num) {
+		if (per_cpu(cpu_sys_devices, num) == NULL) {
+			selected = num;
+			break;
+		}
+	}
+
+	if (selected >= num_possible_cpus()) {
+		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
+		return -EPERM;
+	}
+
+	/* register cpu */
+	arch_register_cpu_node(selected, nid);
+	acpi_map_lsapic_emu(selected, nid);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_probe);
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+	int cpu = 0;
+
+	cpu =  simple_strtoul(buf, NULL, 0);
+	/* cpu 0 is not hotplugable */
+	if (cpu == 0) {
+		printk(KERN_ERR "can not release cpu 0.\n");
+		return -EPERM;
+	}
+
+	if (cpu_online(cpu)) {
+		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
+		if (!cpu_down(cpu)) {
+			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
+			return -EPERM;
+		}
+
+	}
+
+	arch_unregister_cpu(cpu);
+	acpi_unmap_lsapic(cpu);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_release);
+
 #else /* CONFIG_HOTPLUG_CPU */
 
 static int __init arch_register_cpu(int num)
@@ -83,8 +158,14 @@
 		register_one_node(i);
 #endif
 
-	for_each_present_cpu(i)
-		arch_register_cpu(i);
+	/*
+	 * when cpu hotplug emulation enabled, register the online cpu only,
+	 * the rests are reserved for cpu probe.
+	 */
+	for_each_present_cpu(i) {
+		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
+			arch_register_cpu(i);
+	}
 
 	return 0;
 }
Index: linux-hpe4/arch/x86/mm/numa_64.c
===================================================================
--- linux-hpe4.orig/arch/x86/mm/numa_64.c	2010-12-10 14:39:37.153331000 +0800
+++ linux-hpe4/arch/x86/mm/numa_64.c	2010-12-10 14:48:32.123331001 +0800
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -667,3 +668,17 @@
 		return __apicid_to_node[apicid];
 	return NUMA_NO_NODE;
 }
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static __init int cpu_hpe_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+
+	if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
+		cpu_hpe_on = 1;
+
+	return 0;
+}
+early_param("cpu_hpe", cpu_hpe_setup);
+#endif  /* CONFIG_ARCH_CPU_PROBE_RELEASE */
Index: linux-hpe4/drivers/acpi/processor_driver.c
===================================================================
--- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-12-10 13:42:34.593331000 +0800
+++ linux-hpe4/drivers/acpi/processor_driver.c	2010-12-10 14:48:32.143331001 +0800
@@ -542,6 +542,14 @@
 		goto err_free_cpumask;
 
 	sysdev = get_cpu_sysdev(pr->id);
+	/*
+	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
+	 * throu the cpu probe interface. Return directly.
+	 */
+	if (sysdev == NULL) {
+		goto out;
+	}
+
 	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
 		result = -EFAULT;
 		goto err_remove_fs;
@@ -582,6 +590,7 @@
 		goto err_remove_sysfs;
 	}
 
+out:
 	return 0;
 
 err_remove_sysfs:
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-12-10 14:39:43.333331000 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-12-10 14:48:32.143331001 +0800
@@ -22,9 +22,15 @@
 };
 EXPORT_SYMBOL(cpu_sysdev_class);
 
-static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
+DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
 #ifdef CONFIG_HOTPLUG_CPU
+/*
+ * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
+ * disabled in default, we can enable it throu grub parameter cpu_hpe=on
+ */
+int cpu_hpe_on;
+
 static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
 			   char *buf)
 {
Index: linux-hpe4/include/linux/acpi.h
===================================================================
--- linux-hpe4.orig/include/linux/acpi.h	2010-12-10 13:42:34.613331000 +0800
+++ linux-hpe4/include/linux/acpi.h	2010-12-10 14:48:32.153331001 +0800
@@ -102,6 +102,7 @@
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_lsapic(acpi_handle handle, int *pcpu);
+int acpi_map_lsapic_emu(int pcpu, int nid);
 int acpi_unmap_lsapic(int cpu);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-12-10 14:39:43.333331000 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-12-10 14:48:32.153331001 +0800
@@ -25,6 +25,8 @@
 	struct sys_device sysdev;
 };
 
+DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
+
 extern int register_cpu_node(struct cpu *cpu, int num, int nid);
 
 static inline int register_cpu(struct cpu *cpu, int num)
@@ -144,6 +146,7 @@
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
 int cpu_down(unsigned int cpu);
+extern int cpu_hpe_on;
 
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 extern void cpu_hotplug_driver_lock(void);
@@ -166,6 +169,7 @@
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
 #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
+static int cpu_hpe_on;
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_PM_SLEEP_SMP
Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
===================================================================
--- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt	2010-12-10 14:39:37.153331000 +0800
+++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt	2010-12-10 14:48:32.153331001 +0800
@@ -320,3 +320,8 @@
 		Do not use GB pages for kernel direct mappings.
 	gbpages
 		Use GB pages for kernel direct mappings.
+	cpu_hpe=on/off
+		Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
+		sysfs provides probe/release interface to hot add/remove CPUs dynamically.
+		We can use maxcpus=<N> to reserve CPUs.
+		This option is disabled by default.

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [6/7, v9] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
  2010-12-10  7:31 ` shaohui.zheng
@ 2010-12-10  7:31   ` shaohui.zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Sam Ravnborg, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 006-hotplug-emulator-fake_socket_with_logic_cpu_on_x86.patch --]
[-- Type: text/plain, Size: 7693 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

When hotplug a CPU with emulator, we are using a logical CPU to emulate the
CPU hotplug process. For the CPU supported SMT, some logical CPUs are in the
same socket, but it may located in different NUMA node after we have emulator.
it misleads the scheduling domain to build the incorrect hierarchy, and it
causes the following call trace when rebalance the scheduling domain:

divide error: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu8/online
CPU 0 
Modules linked in: fbcon tileblit font bitblit softcursor radeon ttm drm_kms_helper e1000e usbhid via_rhine mii drm i2c_algo_bit igb dca
Pid: 0, comm: swapper Not tainted 2.6.32hpe #78 X8DTN
RIP: 0010:[<ffffffff81051da5>]  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
RSP: 0018:ffff880028203c30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000015ac0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880277e8cfa0 RDI: 0000000000000000
RBP: ffff880028203dc0 R08: ffff880277e8cfa0 R09: 0000000000000040
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f16cfc85770 CR3: 0000000001001000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81822000, task ffffffff8184a600)
Stack:
 ffff880028203d60 ffff880028203cd0 ffff8801c204ff08 ffff880028203e38
<0> 0101ffff81018c59 ffff880028203e44 00000001810806bd ffff8801c204fe00
<0> 0000000528200000 ffffffff00000000 0000000000000018 0000000000015ac0
Call Trace:
 <IRQ> 
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff81053b2c>] rebalance_domains+0x17c/0x570
 [<ffffffff81018c89>] ? read_tsc+0x9/0x20
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff810569ed>] run_rebalance_domains+0xbd/0xf0
 [<ffffffff8106471f>] __do_softirq+0xaf/0x1e0
 [<ffffffff810b7d18>] ? handle_IRQ_event+0x58/0x160
 [<ffffffff810130ac>] call_softirq+0x1c/0x30
 [<ffffffff81014a85>] do_softirq+0x65/0xa0
 [<ffffffff810645cd>] irq_exit+0x7d/0x90
 [<ffffffff81013ff0>] do_IRQ+0x70/0xe0
 [<ffffffff810128d3>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff8133387f>] ? acpi_idle_enter_bm+0x281/0x2b5
 [<ffffffff81333878>] ? acpi_idle_enter_bm+0x27a/0x2b5
 [<ffffffff8145dc8f>] ? cpuidle_idle_call+0x9f/0x130
 [<ffffffff81010e2b>] ? cpu_idle+0xab/0x100
 [<ffffffff8158aee6>] ? rest_init+0x66/0x70
 [<ffffffff81905d90>] ? start_kernel+0x3e3/0x3ef
 [<ffffffff8190533a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81905438>] ? x86_64_start_kernel+0xfa/0x109
Code: 00 00 e9 4c fb ff ff 0f 1f 80 00 00 00 00 48 8b b5 d8 fe ff ff 48 8b 45 a8 4d 29 ef 8b 56 08 48 c1 e0 0a 49 89 f0 48 89 d7 31 d2 <48> f7 f7 31 d2 48 89 45 a0 8b 76 08 4c 89 f0 48 c1 e0 0a 48 f7 
RIP  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
 RSP <ffff880028203c30>

Solution:

We put the logical CPU into a fake CPU socket, and assign it an unique
 phys_proc_id. For the fake socket, we put one logical CPU in only. This
method fixes the above bug.

CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/processor.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/processor.h	2010-11-17 09:00:51.354100239 +0800
+++ linux-hpe4/arch/x86/include/asm/processor.h	2010-11-17 09:01:10.222837594 +0800
@@ -113,6 +113,15 @@
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 #endif
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/*
+	 * Use a logic cpu to emulate a physical cpu's hotplug. We put the
+	 * logical cpu into a fake socket, assign a fake physical id to it,
+	 * and create a fake core.
+	 */
+	__u8		cpu_probe_on; /* A flag to enable cpu probe/release */
+#endif
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 #define X86_VENDOR_INTEL	0
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.222837594 +0800
@@ -97,6 +97,7 @@
  */
 static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 void cpu_hotplug_driver_lock()
 {
         mutex_lock(&x86_cpu_hotplug_driver_mutex);
@@ -106,6 +107,7 @@
 {
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
+#endif
 
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
@@ -198,6 +200,8 @@
 {
 	int cpuid, phys_id;
 	unsigned long timeout;
+	u8 cpu_probe_on = 0;
+	struct cpuinfo_x86 *c;
 
 	/*
 	 * If waken up by an INIT in an 82489DX configuration
@@ -277,7 +281,20 @@
 	/*
 	 * Save our processor parameters
 	 */
+	c = &cpu_data(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+	phys_id = c->phys_proc_id;
+#endif
+
 	smp_store_cpu_info(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	if (cpu_probe_on) {
+		c->phys_proc_id = phys_id; /* restore the fake phys_proc_id */
+		c->cpu_core_id = 0; /* force the logical cpu to core 0 */
+		c->cpu_probe_on = cpu_probe_on;
+	}
+#endif
 
 	notify_cpu_starting(cpuid);
 
@@ -400,6 +417,11 @@
 {
 	int i;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int cpu_probe_on = 0;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+#endif
 
 	cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
 
@@ -431,7 +453,8 @@
 
 	for_each_cpu(i, cpu_sibling_setup_mask) {
 		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
-		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
+		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i) &&
+			cpu_probe_on == 0) {
 			cpumask_set_cpu(i, c->llc_shared_map);
 			cpumask_set_cpu(cpu, cpu_data(i).llc_shared_map);
 		}
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 09:01:10.222837594 +0800
@@ -70,6 +70,36 @@
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+/*
+ * Put the logical cpu into a new sokect, and encapsule it into core 0.
+ */
+static void fake_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int i, phys_id = 0;
+
+	/* calculate the max phys_id */
+	for_each_present_cpu(i) {
+		struct cpuinfo_x86 *c = &cpu_data(i);
+		if (phys_id < c->phys_proc_id)
+			phys_id = c->phys_proc_id;
+	}
+
+	c->phys_proc_id = phys_id + 1; /* pick up a unused phys_proc_id */
+	c->cpu_core_id = 0; /* always put the logical cpu to core 0 */
+	c->cpu_probe_on = 1;
+}
+
+static void clear_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	c->phys_proc_id = 0;
+	c->cpu_core_id = 0;
+	c->cpu_probe_on = 0;
+}
+
+
 ssize_t arch_cpu_probe(const char *buf, size_t count)
 {
 	int nid = 0;
@@ -109,6 +139,7 @@
 	/* register cpu */
 	arch_register_cpu_node(selected, nid);
 	acpi_map_lsapic_emu(selected, nid);
+	fake_cpu_socket_info(selected);
 
 	return count;
 }
@@ -132,10 +163,13 @@
 
 	arch_unregister_cpu(cpu);
 	acpi_unmap_lsapic(cpu);
+	clear_cpu_socket_info(cpu);
+	set_cpu_present(cpu, true);
 
 	return count;
 }
 EXPORT_SYMBOL(arch_cpu_release);
+#endif CONFIG_ARCH_CPU_PROBE_RELEASE
 
 #else /* CONFIG_HOTPLUG_CPU */
 

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [6/7, v9] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
@ 2010-12-10  7:31   ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Sam Ravnborg, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 006-hotplug-emulator-fake_socket_with_logic_cpu_on_x86.patch --]
[-- Type: text/plain, Size: 7989 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

When hotplug a CPU with emulator, we are using a logical CPU to emulate the
CPU hotplug process. For the CPU supported SMT, some logical CPUs are in the
same socket, but it may located in different NUMA node after we have emulator.
it misleads the scheduling domain to build the incorrect hierarchy, and it
causes the following call trace when rebalance the scheduling domain:

divide error: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu8/online
CPU 0 
Modules linked in: fbcon tileblit font bitblit softcursor radeon ttm drm_kms_helper e1000e usbhid via_rhine mii drm i2c_algo_bit igb dca
Pid: 0, comm: swapper Not tainted 2.6.32hpe #78 X8DTN
RIP: 0010:[<ffffffff81051da5>]  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
RSP: 0018:ffff880028203c30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000015ac0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880277e8cfa0 RDI: 0000000000000000
RBP: ffff880028203dc0 R08: ffff880277e8cfa0 R09: 0000000000000040
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f16cfc85770 CR3: 0000000001001000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81822000, task ffffffff8184a600)
Stack:
 ffff880028203d60 ffff880028203cd0 ffff8801c204ff08 ffff880028203e38
<0> 0101ffff81018c59 ffff880028203e44 00000001810806bd ffff8801c204fe00
<0> 0000000528200000 ffffffff00000000 0000000000000018 0000000000015ac0
Call Trace:
 <IRQ> 
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff81053b2c>] rebalance_domains+0x17c/0x570
 [<ffffffff81018c89>] ? read_tsc+0x9/0x20
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff810569ed>] run_rebalance_domains+0xbd/0xf0
 [<ffffffff8106471f>] __do_softirq+0xaf/0x1e0
 [<ffffffff810b7d18>] ? handle_IRQ_event+0x58/0x160
 [<ffffffff810130ac>] call_softirq+0x1c/0x30
 [<ffffffff81014a85>] do_softirq+0x65/0xa0
 [<ffffffff810645cd>] irq_exit+0x7d/0x90
 [<ffffffff81013ff0>] do_IRQ+0x70/0xe0
 [<ffffffff810128d3>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff8133387f>] ? acpi_idle_enter_bm+0x281/0x2b5
 [<ffffffff81333878>] ? acpi_idle_enter_bm+0x27a/0x2b5
 [<ffffffff8145dc8f>] ? cpuidle_idle_call+0x9f/0x130
 [<ffffffff81010e2b>] ? cpu_idle+0xab/0x100
 [<ffffffff8158aee6>] ? rest_init+0x66/0x70
 [<ffffffff81905d90>] ? start_kernel+0x3e3/0x3ef
 [<ffffffff8190533a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81905438>] ? x86_64_start_kernel+0xfa/0x109
Code: 00 00 e9 4c fb ff ff 0f 1f 80 00 00 00 00 48 8b b5 d8 fe ff ff 48 8b 45 a8 4d 29 ef 8b 56 08 48 c1 e0 0a 49 89 f0 48 89 d7 31 d2 <48> f7 f7 31 d2 48 89 45 a0 8b 76 08 4c 89 f0 48 c1 e0 0a 48 f7 
RIP  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
 RSP <ffff880028203c30>

Solution:

We put the logical CPU into a fake CPU socket, and assign it an unique
 phys_proc_id. For the fake socket, we put one logical CPU in only. This
method fixes the above bug.

CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/processor.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/processor.h	2010-11-17 09:00:51.354100239 +0800
+++ linux-hpe4/arch/x86/include/asm/processor.h	2010-11-17 09:01:10.222837594 +0800
@@ -113,6 +113,15 @@
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 #endif
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/*
+	 * Use a logic cpu to emulate a physical cpu's hotplug. We put the
+	 * logical cpu into a fake socket, assign a fake physical id to it,
+	 * and create a fake core.
+	 */
+	__u8		cpu_probe_on; /* A flag to enable cpu probe/release */
+#endif
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 #define X86_VENDOR_INTEL	0
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.222837594 +0800
@@ -97,6 +97,7 @@
  */
 static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 void cpu_hotplug_driver_lock()
 {
         mutex_lock(&x86_cpu_hotplug_driver_mutex);
@@ -106,6 +107,7 @@
 {
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
+#endif
 
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
@@ -198,6 +200,8 @@
 {
 	int cpuid, phys_id;
 	unsigned long timeout;
+	u8 cpu_probe_on = 0;
+	struct cpuinfo_x86 *c;
 
 	/*
 	 * If waken up by an INIT in an 82489DX configuration
@@ -277,7 +281,20 @@
 	/*
 	 * Save our processor parameters
 	 */
+	c = &cpu_data(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+	phys_id = c->phys_proc_id;
+#endif
+
 	smp_store_cpu_info(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	if (cpu_probe_on) {
+		c->phys_proc_id = phys_id; /* restore the fake phys_proc_id */
+		c->cpu_core_id = 0; /* force the logical cpu to core 0 */
+		c->cpu_probe_on = cpu_probe_on;
+	}
+#endif
 
 	notify_cpu_starting(cpuid);
 
@@ -400,6 +417,11 @@
 {
 	int i;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int cpu_probe_on = 0;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+#endif
 
 	cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
 
@@ -431,7 +453,8 @@
 
 	for_each_cpu(i, cpu_sibling_setup_mask) {
 		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
-		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
+		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i) &&
+			cpu_probe_on == 0) {
 			cpumask_set_cpu(i, c->llc_shared_map);
 			cpumask_set_cpu(cpu, cpu_data(i).llc_shared_map);
 		}
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 09:01:10.222837594 +0800
@@ -70,6 +70,36 @@
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+/*
+ * Put the logical cpu into a new sokect, and encapsule it into core 0.
+ */
+static void fake_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int i, phys_id = 0;
+
+	/* calculate the max phys_id */
+	for_each_present_cpu(i) {
+		struct cpuinfo_x86 *c = &cpu_data(i);
+		if (phys_id < c->phys_proc_id)
+			phys_id = c->phys_proc_id;
+	}
+
+	c->phys_proc_id = phys_id + 1; /* pick up a unused phys_proc_id */
+	c->cpu_core_id = 0; /* always put the logical cpu to core 0 */
+	c->cpu_probe_on = 1;
+}
+
+static void clear_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	c->phys_proc_id = 0;
+	c->cpu_core_id = 0;
+	c->cpu_probe_on = 0;
+}
+
+
 ssize_t arch_cpu_probe(const char *buf, size_t count)
 {
 	int nid = 0;
@@ -109,6 +139,7 @@
 	/* register cpu */
 	arch_register_cpu_node(selected, nid);
 	acpi_map_lsapic_emu(selected, nid);
+	fake_cpu_socket_info(selected);
 
 	return count;
 }
@@ -132,10 +163,13 @@
 
 	arch_unregister_cpu(cpu);
 	acpi_unmap_lsapic(cpu);
+	clear_cpu_socket_info(cpu);
+	set_cpu_present(cpu, true);
 
 	return count;
 }
 EXPORT_SYMBOL(arch_cpu_release);
+#endif CONFIG_ARCH_CPU_PROBE_RELEASE
 
 #else /* CONFIG_HOTPLUG_CPU */
 

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [7/7, v9] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
  2010-12-10  7:31 ` shaohui.zheng
@ 2010-12-10  7:31   ` shaohui.zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 007-hotplug-emulator-add-memory-debugfs-interface.patch --]
[-- Type: text/plain, Size: 4457 bytes --]

From:  Shaohui Zheng <shaohui.zheng@intel.com>

Add add_memory interface to support to memory hotplug emulation for each online
node under debugfs. The reserved memory can be added into desired node with
this interface.

The layout on debugfs:
	mem_hotplug/node0/add_memory
	mem_hotplug/node1/add_memory
	mem_hotplug/node2/add_memory
	...

Add a memory section(128M) to node 3(boots with mem=1024m)

	echo 0x40000000 > mem_hotplug/node3/add_memory

CC: David Rientjes <rientjes@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-12-10 13:22:44.753331000 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-12-10 13:41:48.803331000 +0800
@@ -933,6 +933,81 @@
 
 static struct dentry *memhp_debug_root;
 
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+
+static ssize_t add_memory_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	u64 phys_addr = 0;
+	int nid = file->private_data - NULL;
+	int ret;
+
+	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
+	phys_addr = simple_strtoull(buf, NULL, 0);
+
+	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+	if (ret)
+		count = ret;
+
+	return count;
+}
+
+static int add_memory_open(struct inode *inode, struct file *file)
+{
+	file->private_data = inode->i_private;
+	return 0;
+}
+
+static const struct file_operations add_memory_file_ops = {
+	.open		= add_memory_open,
+	.write		= add_memory_store,
+	.llseek		= generic_file_llseek,
+};
+
+/*
+ * Create add_memory debugfs entry under specified node
+ */
+static int debugfs_create_add_memory_entry(int nid)
+{
+	char buf[32];
+	static struct dentry *node_debug_root;
+
+	snprintf(buf, sizeof(buf), "node%d", nid);
+	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
+	if (!node_debug_root)
+		return -ENOMEM;
+
+	/* the nid information was represented by the offset of pointer(NULL+nid) */
+	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
+			NULL + nid, &add_memory_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int __init memory_debug_init(void)
+{
+	int nid;
+
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	for_each_online_node(nid)
+		 debugfs_create_add_memory_entry(nid);
+
+	return 0;
+}
+
+module_init(memory_debug_init);
+#else
+static debugfs_create_add_memory_entry(int nid)
+{
+	return 0;
+}
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
+
 static ssize_t add_node_store(struct file *file, const char __user *buf,
 				size_t count, loff_t *ppos)
 {
@@ -963,6 +1038,8 @@
 		return -ENOMEM;
 
 	ret = add_memory(nid, start, size);
+
+	debugfs_create_add_memory_entry(nid);
 	return ret ? ret : count;
 }
 
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-12-10 13:22:44.733331000 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-12-10 13:42:12.783331002 +0800
@@ -19,6 +19,7 @@
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
   4.3 Node hotplug emulation
+  4.4 Memory hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -239,6 +240,25 @@
 Once the new node has been added, it is possible to online the memory by
 toggling the "state" of its memory section(s) as described in section 5.1.
 
+4.4 Memory hotplug emulation
+------------
+With debugfs, it is possible to test memory hotplug with software method, we
+can add memory section to desired node with add_memory interface. It is a much
+more powerful interface than "probe" described in section 4.2.
+
+There is an add_memory interface for each online node at the debugfs mount
+point.
+	mem_hotplug/node0/add_memory
+	mem_hotplug/node1/add_memory
+	mem_hotplug/node2/add_memory
+	...
+
+Add a memory section(128M) to node 3(boots with mem=1024m)
+
+	echo 0x40000000 > mem_hotplug/node3/add_memory
+
+Once the new memory section has been added, it is possible to online the memory
+by toggling the "state" described in section 5.1.
 
 ------------------------------
 5. Logical Memory hot-add phase

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [7/7, v9] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
@ 2010-12-10  7:31   ` shaohui.zheng
  0 siblings, 0 replies; 61+ messages in thread
From: shaohui.zheng @ 2010-12-10  7:31 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 007-hotplug-emulator-add-memory-debugfs-interface.patch --]
[-- Type: text/plain, Size: 4753 bytes --]

From:  Shaohui Zheng <shaohui.zheng@intel.com>

Add add_memory interface to support to memory hotplug emulation for each online
node under debugfs. The reserved memory can be added into desired node with
this interface.

The layout on debugfs:
	mem_hotplug/node0/add_memory
	mem_hotplug/node1/add_memory
	mem_hotplug/node2/add_memory
	...

Add a memory section(128M) to node 3(boots with mem=1024m)

	echo 0x40000000 > mem_hotplug/node3/add_memory

CC: David Rientjes <rientjes@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-12-10 13:22:44.753331000 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-12-10 13:41:48.803331000 +0800
@@ -933,6 +933,81 @@
 
 static struct dentry *memhp_debug_root;
 
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+
+static ssize_t add_memory_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	u64 phys_addr = 0;
+	int nid = file->private_data - NULL;
+	int ret;
+
+	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
+	phys_addr = simple_strtoull(buf, NULL, 0);
+
+	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+	if (ret)
+		count = ret;
+
+	return count;
+}
+
+static int add_memory_open(struct inode *inode, struct file *file)
+{
+	file->private_data = inode->i_private;
+	return 0;
+}
+
+static const struct file_operations add_memory_file_ops = {
+	.open		= add_memory_open,
+	.write		= add_memory_store,
+	.llseek		= generic_file_llseek,
+};
+
+/*
+ * Create add_memory debugfs entry under specified node
+ */
+static int debugfs_create_add_memory_entry(int nid)
+{
+	char buf[32];
+	static struct dentry *node_debug_root;
+
+	snprintf(buf, sizeof(buf), "node%d", nid);
+	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
+	if (!node_debug_root)
+		return -ENOMEM;
+
+	/* the nid information was represented by the offset of pointer(NULL+nid) */
+	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
+			NULL + nid, &add_memory_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int __init memory_debug_init(void)
+{
+	int nid;
+
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	for_each_online_node(nid)
+		 debugfs_create_add_memory_entry(nid);
+
+	return 0;
+}
+
+module_init(memory_debug_init);
+#else
+static debugfs_create_add_memory_entry(int nid)
+{
+	return 0;
+}
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
+
 static ssize_t add_node_store(struct file *file, const char __user *buf,
 				size_t count, loff_t *ppos)
 {
@@ -963,6 +1038,8 @@
 		return -ENOMEM;
 
 	ret = add_memory(nid, start, size);
+
+	debugfs_create_add_memory_entry(nid);
 	return ret ? ret : count;
 }
 
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-12-10 13:22:44.733331000 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-12-10 13:42:12.783331002 +0800
@@ -19,6 +19,7 @@
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
   4.3 Node hotplug emulation
+  4.4 Memory hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -239,6 +240,25 @@
 Once the new node has been added, it is possible to online the memory by
 toggling the "state" of its memory section(s) as described in section 5.1.
 
+4.4 Memory hotplug emulation
+------------
+With debugfs, it is possible to test memory hotplug with software method, we
+can add memory section to desired node with add_memory interface. It is a much
+more powerful interface than "probe" described in section 4.2.
+
+There is an add_memory interface for each online node at the debugfs mount
+point.
+	mem_hotplug/node0/add_memory
+	mem_hotplug/node1/add_memory
+	mem_hotplug/node2/add_memory
+	...
+
+Add a memory section(128M) to node 3(boots with mem=1024m)
+
+	echo 0x40000000 > mem_hotplug/node3/add_memory
+
+Once the new memory section has been added, it is possible to online the memory
+by toggling the "state" described in section 5.1.
 
 ------------------------------
 5. Logical Memory hot-add phase

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-10  7:31   ` shaohui.zheng
  (?)
@ 2010-12-16 16:25   ` Eric B Munson
  2010-12-16 23:34       ` Shaohui Zheng
  -1 siblings, 1 reply; 61+ messages in thread
From: Eric B Munson @ 2010-12-16 16:25 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	shaohui.zheng, rientjes, dave, gregkh, Ingo Molnar, Len Brown,
	Yinghai Lu, Tejun Heo, Haicheng Li

[-- Attachment #1: Type: text/plain, Size: 12438 bytes --]

Shaohui,

What kernel is this series based on?  I cannot get it to build when applied
to mainline.  I seem to be missing a definition for set_apicid_to_node.

Eric

On Fri, 10 Dec 2010, shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
> was already supported in current linux kernel. NUMA Hotplug Emulator provides
> a mechanism to emulate the process with software method. It can be used for
> testing or debuging purpose.
> 
> CPU physical hotplug is different with logical CPU online/offline. Logical
> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
> hotplug emulator uses probe/release interface. It becomes possible to do cpu
> hotplug automation and stress
> 
> Add cpu interface probe/release under sysfs for x86_64. User can use this
> interface to emulate the cpu hot-add and hot-remove process.
> 
> Directive:
> *) Reserve CPU thru grub parameter like:
> 	maxcpus=4
> 
> the rest CPUs will not be initiliazed. 
> 
> *) Probe CPU
> we can use the probe interface to hot-add new CPUs:
> 	echo nid > /sys/devices/system/cpu/probe
> 
> *) Release a CPU
> 	echo cpu > /sys/devices/system/cpu/release
> 
> A reserved CPU will be hot-added to the specified node.
> 1) nid == 0, the CPU will be added to the real node which the CPU
> should be in
> 2) nid != 0, add the CPU to node nid even through it is a fake node.
> 
> CC: Ingo Molnar <mingo@elte.hu>
> CC: Len Brown <len.brown@intel.com>
> CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
> CC: Tejun Heo <tj@kernel.org>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> ---
> This patch is based on Tejun's unification of the 32 and 64 bit NUMA boot paths,
>  specifically the patch at http://marc.info/?l=linux-kernel&m=129087151912379.
> Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c	2010-12-10 13:42:34.553331000 +0800
> +++ linux-hpe4/arch/x86/kernel/acpi/boot.c	2010-12-10 14:48:32.113331001 +0800
> @@ -668,8 +668,39 @@
>  }
>  EXPORT_SYMBOL(acpi_map_lsapic);
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
> +{
> +#ifdef CONFIG_ACPI_NUMA
> +	set_apicid_to_node(physid, nid);
> +	numa_set_node(cpu, nid);
> +#endif
> +}
> +
> +static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
> +int __ref acpi_map_lsapic_emu(int pcpu, int nid)
> +{
> +	/* backup cpu apicid to array cpu_to_apicid_saved */
> +	if (cpu_to_apicid_saved[pcpu] == 0 &&
> +		per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
> +		cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
> +
> +	per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
> +	acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
> +
> +	return pcpu;
> +}
> +EXPORT_SYMBOL(acpi_map_lsapic_emu);
> +#endif
> +
>  int acpi_unmap_lsapic(int cpu)
>  {
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +	/* backup cpu apicid to array cpu_to_apicid_saved */
> +	if (cpu_to_apicid_saved[cpu] == 0 &&
> +		per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
> +		cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
> +#endif
>  	per_cpu(x86_cpu_to_apicid, cpu) = -1;
>  	set_cpu_present(cpu, false);
>  	num_processors--;
> Index: linux-hpe4/arch/x86/kernel/smpboot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-12-10 13:42:34.563331000 +0800
> +++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-12-10 14:48:32.113331001 +0800
> @@ -103,8 +103,6 @@
>          mutex_unlock(&x86_cpu_hotplug_driver_mutex);
>  }
>  
> -ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
> -ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
>  #else
>  static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
>  #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
> Index: linux-hpe4/arch/x86/kernel/topology.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-12-10 14:39:43.333331000 +0800
> +++ linux-hpe4/arch/x86/kernel/topology.c	2010-12-10 14:49:56.043331000 +0800
> @@ -30,6 +30,9 @@
>  #include <linux/init.h>
>  #include <linux/smp.h>
>  #include <asm/cpu.h>
> +#include <linux/cpu.h>
> +#include <linux/topology.h>
> +#include <linux/acpi.h>
>  
>  static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
>  
> @@ -66,6 +69,78 @@
>  	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
>  }
>  EXPORT_SYMBOL(arch_unregister_cpu);
> +
> +ssize_t arch_cpu_probe(const char *buf, size_t count)
> +{
> +	int nid = 0;
> +	int num = 0, selected = 0;
> +
> +	/* check parameters */
> +	if (!buf || count < 2)
> +		return -EPERM;
> +
> +	nid = simple_strtoul(buf, NULL, 0);
> +	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
> +
> +	if (nid < 0 || nid > nr_node_ids - 1) {
> +		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
> +			nid, nr_node_ids);
> +		return -EPERM;
> +	}
> +
> +	if (!node_online(nid)) {
> +		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
> +		return -EPERM;
> +	}
> +
> +	/* find first uninitialized cpu */
> +	for_each_present_cpu(num) {
> +		if (per_cpu(cpu_sys_devices, num) == NULL) {
> +			selected = num;
> +			break;
> +		}
> +	}
> +
> +	if (selected >= num_possible_cpus()) {
> +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> +		return -EPERM;
> +	}
> +
> +	/* register cpu */
> +	arch_register_cpu_node(selected, nid);
> +	acpi_map_lsapic_emu(selected, nid);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_probe);
> +
> +ssize_t arch_cpu_release(const char *buf, size_t count)
> +{
> +	int cpu = 0;
> +
> +	cpu =  simple_strtoul(buf, NULL, 0);
> +	/* cpu 0 is not hotplugable */
> +	if (cpu == 0) {
> +		printk(KERN_ERR "can not release cpu 0.\n");
> +		return -EPERM;
> +	}
> +
> +	if (cpu_online(cpu)) {
> +		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
> +		if (!cpu_down(cpu)) {
> +			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
> +			return -EPERM;
> +		}
> +
> +	}
> +
> +	arch_unregister_cpu(cpu);
> +	acpi_unmap_lsapic(cpu);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_release);
> +
>  #else /* CONFIG_HOTPLUG_CPU */
>  
>  static int __init arch_register_cpu(int num)
> @@ -83,8 +158,14 @@
>  		register_one_node(i);
>  #endif
>  
> -	for_each_present_cpu(i)
> -		arch_register_cpu(i);
> +	/*
> +	 * when cpu hotplug emulation enabled, register the online cpu only,
> +	 * the rests are reserved for cpu probe.
> +	 */
> +	for_each_present_cpu(i) {
> +		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
> +			arch_register_cpu(i);
> +	}
>  
>  	return 0;
>  }
> Index: linux-hpe4/arch/x86/mm/numa_64.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/mm/numa_64.c	2010-12-10 14:39:37.153331000 +0800
> +++ linux-hpe4/arch/x86/mm/numa_64.c	2010-12-10 14:48:32.123331001 +0800
> @@ -13,6 +13,7 @@
>  #include <linux/module.h>
>  #include <linux/nodemask.h>
>  #include <linux/sched.h>
> +#include <linux/cpu.h>
>  
>  #include <asm/e820.h>
>  #include <asm/proto.h>
> @@ -667,3 +668,17 @@
>  		return __apicid_to_node[apicid];
>  	return NUMA_NO_NODE;
>  }
> +
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static __init int cpu_hpe_setup(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +
> +	if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
> +		cpu_hpe_on = 1;
> +
> +	return 0;
> +}
> +early_param("cpu_hpe", cpu_hpe_setup);
> +#endif  /* CONFIG_ARCH_CPU_PROBE_RELEASE */
> Index: linux-hpe4/drivers/acpi/processor_driver.c
> ===================================================================
> --- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-12-10 13:42:34.593331000 +0800
> +++ linux-hpe4/drivers/acpi/processor_driver.c	2010-12-10 14:48:32.143331001 +0800
> @@ -542,6 +542,14 @@
>  		goto err_free_cpumask;
>  
>  	sysdev = get_cpu_sysdev(pr->id);
> +	/*
> +	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
> +	 * throu the cpu probe interface. Return directly.
> +	 */
> +	if (sysdev == NULL) {
> +		goto out;
> +	}
> +
>  	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
>  		result = -EFAULT;
>  		goto err_remove_fs;
> @@ -582,6 +590,7 @@
>  		goto err_remove_sysfs;
>  	}
>  
> +out:
>  	return 0;
>  
>  err_remove_sysfs:
> Index: linux-hpe4/drivers/base/cpu.c
> ===================================================================
> --- linux-hpe4.orig/drivers/base/cpu.c	2010-12-10 14:39:43.333331000 +0800
> +++ linux-hpe4/drivers/base/cpu.c	2010-12-10 14:48:32.143331001 +0800
> @@ -22,9 +22,15 @@
>  };
>  EXPORT_SYMBOL(cpu_sysdev_class);
>  
> -static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +/*
> + * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
> + * disabled in default, we can enable it throu grub parameter cpu_hpe=on
> + */
> +int cpu_hpe_on;
> +
>  static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
>  			   char *buf)
>  {
> Index: linux-hpe4/include/linux/acpi.h
> ===================================================================
> --- linux-hpe4.orig/include/linux/acpi.h	2010-12-10 13:42:34.613331000 +0800
> +++ linux-hpe4/include/linux/acpi.h	2010-12-10 14:48:32.153331001 +0800
> @@ -102,6 +102,7 @@
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  /* Arch dependent functions for cpu hotplug support */
>  int acpi_map_lsapic(acpi_handle handle, int *pcpu);
> +int acpi_map_lsapic_emu(int pcpu, int nid);
>  int acpi_unmap_lsapic(int cpu);
>  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
>  
> Index: linux-hpe4/include/linux/cpu.h
> ===================================================================
> --- linux-hpe4.orig/include/linux/cpu.h	2010-12-10 14:39:43.333331000 +0800
> +++ linux-hpe4/include/linux/cpu.h	2010-12-10 14:48:32.153331001 +0800
> @@ -25,6 +25,8 @@
>  	struct sys_device sysdev;
>  };
>  
> +DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +
>  extern int register_cpu_node(struct cpu *cpu, int num, int nid);
>  
>  static inline int register_cpu(struct cpu *cpu, int num)
> @@ -144,6 +146,7 @@
>  #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
>  #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
>  int cpu_down(unsigned int cpu);
> +extern int cpu_hpe_on;
>  
>  #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>  extern void cpu_hotplug_driver_lock(void);
> @@ -166,6 +169,7 @@
>  /* These aren't inline functions due to a GCC bug. */
>  #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
>  #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
> +static int cpu_hpe_on;
>  #endif		/* CONFIG_HOTPLUG_CPU */
>  
>  #ifdef CONFIG_PM_SLEEP_SMP
> Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt	2010-12-10 14:39:37.153331000 +0800
> +++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt	2010-12-10 14:48:32.153331001 +0800
> @@ -320,3 +320,8 @@
>  		Do not use GB pages for kernel direct mappings.
>  	gbpages
>  		Use GB pages for kernel direct mappings.
> +	cpu_hpe=on/off
> +		Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
> +		sysfs provides probe/release interface to hot add/remove CPUs dynamically.
> +		We can use maxcpus=<N> to reserve CPUs.
> +		This option is disabled by default.
> 
> -- 
> Thanks & Regards,
> Shaohui
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-16 16:25   ` Eric B Munson
@ 2010-12-16 23:34       ` Shaohui Zheng
  0 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-16 23:34 UTC (permalink / raw)
  To: Eric B Munson
  Cc: shaohui.zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Thu, Dec 16, 2010 at 09:25:41AM -0700, Eric B Munson wrote:
> Shaohui,
> 
> What kernel is this series based on?  I cannot get it to build when applied
> to mainline.  I seem to be missing a definition for set_apicid_to_node.
> 
> Eric
> 

Eric,
	These is a code conflict with Tejun's NUNA unification code, and Tejun's code is still under
review. This patchset solves the code conflict, the v9 emulator is based on his patches, and we
need to wait until his patches was accepted.

Tejun's patch: http://marc.info/?l=linux-kernel&m=129087151912379.

	If you are doing some testing, you can try to use v8 emulator.

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-16 23:34       ` Shaohui Zheng
  0 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-16 23:34 UTC (permalink / raw)
  To: Eric B Munson
  Cc: shaohui.zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Thu, Dec 16, 2010 at 09:25:41AM -0700, Eric B Munson wrote:
> Shaohui,
> 
> What kernel is this series based on?  I cannot get it to build when applied
> to mainline.  I seem to be missing a definition for set_apicid_to_node.
> 
> Eric
> 

Eric,
	These is a code conflict with Tejun's NUNA unification code, and Tejun's code is still under
review. This patchset solves the code conflict, the v9 emulator is based on his patches, and we
need to wait until his patches was accepted.

Tejun's patch: http://marc.info/?l=linux-kernel&m=129087151912379.

	If you are doing some testing, you can try to use v8 emulator.

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [2/7, v9] NUMA Hotplug Emulator: Add numa=possible option
  2010-12-10  7:31   ` shaohui.zheng
@ 2010-12-23  0:27     ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Haicheng Li

On Fri, 10 Dec 2010 15:31:21 +0800
shaohui.zheng@intel.com wrote:

> @@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
>  		numa_set_node(i, 0);
>  	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
>  	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
> +out: __maybe_unused

hm, I didn't know you could do that with labels.

Does it work?

> +	for (i = 0; i < numa_possible_nodes; i++) {
> +		int nid;
> +
> +		nid = first_unset_node(node_possible_map);
> +		if (nid == MAX_NUMNODES)
> +			break;
> +		node_set(nid, node_possible_map);
> +	}
>  }
>  
>  unsigned long __init numa_free_all_bootmem(void)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [2/7, v9] NUMA Hotplug Emulator: Add numa=possible option
@ 2010-12-23  0:27     ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Haicheng Li

On Fri, 10 Dec 2010 15:31:21 +0800
shaohui.zheng@intel.com wrote:

> @@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
>  		numa_set_node(i, 0);
>  	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
>  	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
> +out: __maybe_unused

hm, I didn't know you could do that with labels.

Does it work?

> +	for (i = 0; i < numa_possible_nodes; i++) {
> +		int nid;
> +
> +		nid = first_unset_node(node_possible_map);
> +		if (nid == MAX_NUMNODES)
> +			break;
> +		node_set(nid, node_possible_map);
> +	}
>  }
>  
>  unsigned long __init numa_free_all_bootmem(void)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
  2010-12-10  7:31   ` shaohui.zheng
@ 2010-12-23  0:27     ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Haicheng Li

On Fri, 10 Dec 2010 15:31:22 +0800
shaohui.zheng@intel.com wrote:

> From: David Rientjes <rientjes@google.com>
> 
> Add an interface to allow new nodes to be added when performing memory
> hot-add.  This provides a convenient interface to test memory hotplug
> notifier callbacks and surrounding hotplug code when new nodes are
> onlined without actually having a machine with such hotpluggable SRAT
> entries.
> 
> This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
> that behaves in a similar way to the memory hot-add "probe" interface.
> Its format is size@start, where "size" is the size of the new node to be
> added and "start" is the physical address of the new memory.
> 
> The new node id is a currently offline, but possible, node.  The bit must
> be set in node_possible_map so that nr_node_ids is sized appropriately.
> 
> For emulation on x86, for example, it would be possible to set aside
> memory for hotplugged nodes (say, anything above 2G) and to add an
> additional four nodes as being possible on boot with
> 
> 	mem=2G numa=possible=4
> 
> and then creating a new 128M node at runtime:
> 
> 	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
> 	On node 1 totalpages: 0
> 	init_memory_mapping: 0000000080000000-0000000088000000
> 	 0080000000 - 0088000000 page 2M
> Once the new node has been added, its memory can be onlined.  If this
> memory represents memory section 16, for example:
> 
> 	# echo online > /sys/devices/system/memory/memory16/state
> 	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
> 	Policy zone: Normal
>  [ The memory section(s) mapped to a particular node are visible via
>    /sys/kernel/debug/mem_hotplug/node1, in this example. ]
> 
> The new node is now hotplugged and ready for testing.
> 
> CC: Haicheng Li <haicheng.li@intel.com>
> CC: Greg KH <gregkh@suse.de>
> Signed-off-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> ---
>  Documentation/memory-hotplug.txt |   24 +++++++++++++++
>  mm/memory_hotplug.c              |   59 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+), 0 deletions(-)
> Index: linux-hpe4/Documentation/memory-hotplug.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-11-30 12:40:43.527622001 +0800
> +++ linux-hpe4/Documentation/memory-hotplug.txt	2010-11-30 14:11:11.827622000 +0800
> @@ -18,6 +18,7 @@
>  4. Physical memory hot-add phase
>    4.1 Hardware(Firmware) Support
>    4.2 Notify memory hot-add event by hand
> +  4.3 Node hotplug emulation
>  5. Logical Memory hot-add phase
>    5.1. State of memory
>    5.2. How to online memory
> @@ -215,6 +216,29 @@
>  Please see "How to online memory" in this text.
>  
>  
> +4.3 Node hotplug emulation
> +------------
> +With debugfs, it is possible to test node hotplug by assigning the newly
> +added memory to a new node id when using a different interface with a similar
> +behavior to "probe" described in section 4.2.  If a node id is possible
> +(there are bits in /sys/devices/system/memory/possible that are not online),
> +then it may be used to emulate a newly added node as the result of memory
> +hotplug by using the debugfs "add_node" interface.
> +
> +The add_node interface is located at "mem_hotplug/add_node" at the debugfs
> +mount point.
> +
> +You can create a new node of a specified size starting at the physical
> +address of new memory by
> +
> +% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
> +
> +Where "size" can be represented in megabytes or gigabytes (for example,
> +"128M" or "1G").  The minumum size is that of a memory section.
> +
> +Once the new node has been added, it is possible to online the memory by
> +toggling the "state" of its memory section(s) as described in section 5.1.
> +
>  
>  ------------------------------
>  5. Logical Memory hot-add phase
> Index: linux-hpe4/mm/memory_hotplug.c
> ===================================================================
> --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> @@ -924,3 +924,63 @@
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  EXPORT_SYMBOL_GPL(remove_memory);
> +
> +#ifdef CONFIG_DEBUG_FS
> +#include <linux/debugfs.h>
> +
> +static struct dentry *memhp_debug_root;
> +
> +static ssize_t add_node_store(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	nodemask_t mask;

NODEMASK_ALLOC()?

> +	u64 start, size;
> +	char buffer[64];
> +	char *p;
> +	int nid;
> +	int ret;
> +
> +	memset(buffer, 0, sizeof(buffer));
> +	if (count > sizeof(buffer) - 1)
> +		count = sizeof(buffer) - 1;

This will cause the write to return a smaller number than `count': a
short write.  Some userspace code may then decide to write the
remainder of the data (whcih is the correct way to use the write()
syscall).

Could be a bit dangerous, and perhaps simply declaring an error if too
much data was written would be a better approach.

> +	if (copy_from_user(buffer, buf, count))
> +		return -EFAULT;
> +
> +	size = memparse(buffer, &p);
> +	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))

PAGES_PER_SECTION has type unsigned long, so the rhs of this comparison
might overflow on 32-bit, should anyone ever try to use this code on
32-bit.

otoh the compiler might do it as 64-bit because the lhs is 64-bit.  Not
sure.

> +		return -EINVAL;
> +	if (*p != '@')
> +		return -EINVAL;
> +
> +	start = simple_strtoull(p + 1, NULL, 0);

You disagreed with checkpatch?

> +	nodes_andnot(mask, node_possible_map, node_online_map);
> +	nid = first_node(mask);
> +	if (nid == MAX_NUMNODES)
> +		return -ENOMEM;
> +
> +	ret = add_memory(nid, start, size);
> +	return ret ? ret : count;
> +}
> +
> +static const struct file_operations add_node_file_ops = {
> +	.write		= add_node_store,
> +	.llseek		= generic_file_llseek,
> +};
> +
> +static int __init node_debug_init(void)
> +{
> +	if (!memhp_debug_root)
> +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> +	if (!memhp_debug_root)
> +		return -ENOMEM;
> +
> +	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
> +			NULL, &add_node_file_ops))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +module_init(node_debug_init);
> +#endif /* CONFIG_DEBUG_FS */


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
@ 2010-12-23  0:27     ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Haicheng Li

On Fri, 10 Dec 2010 15:31:22 +0800
shaohui.zheng@intel.com wrote:

> From: David Rientjes <rientjes@google.com>
> 
> Add an interface to allow new nodes to be added when performing memory
> hot-add.  This provides a convenient interface to test memory hotplug
> notifier callbacks and surrounding hotplug code when new nodes are
> onlined without actually having a machine with such hotpluggable SRAT
> entries.
> 
> This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
> that behaves in a similar way to the memory hot-add "probe" interface.
> Its format is size@start, where "size" is the size of the new node to be
> added and "start" is the physical address of the new memory.
> 
> The new node id is a currently offline, but possible, node.  The bit must
> be set in node_possible_map so that nr_node_ids is sized appropriately.
> 
> For emulation on x86, for example, it would be possible to set aside
> memory for hotplugged nodes (say, anything above 2G) and to add an
> additional four nodes as being possible on boot with
> 
> 	mem=2G numa=possible=4
> 
> and then creating a new 128M node at runtime:
> 
> 	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
> 	On node 1 totalpages: 0
> 	init_memory_mapping: 0000000080000000-0000000088000000
> 	 0080000000 - 0088000000 page 2M
> Once the new node has been added, its memory can be onlined.  If this
> memory represents memory section 16, for example:
> 
> 	# echo online > /sys/devices/system/memory/memory16/state
> 	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
> 	Policy zone: Normal
>  [ The memory section(s) mapped to a particular node are visible via
>    /sys/kernel/debug/mem_hotplug/node1, in this example. ]
> 
> The new node is now hotplugged and ready for testing.
> 
> CC: Haicheng Li <haicheng.li@intel.com>
> CC: Greg KH <gregkh@suse.de>
> Signed-off-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> ---
>  Documentation/memory-hotplug.txt |   24 +++++++++++++++
>  mm/memory_hotplug.c              |   59 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+), 0 deletions(-)
> Index: linux-hpe4/Documentation/memory-hotplug.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-11-30 12:40:43.527622001 +0800
> +++ linux-hpe4/Documentation/memory-hotplug.txt	2010-11-30 14:11:11.827622000 +0800
> @@ -18,6 +18,7 @@
>  4. Physical memory hot-add phase
>    4.1 Hardware(Firmware) Support
>    4.2 Notify memory hot-add event by hand
> +  4.3 Node hotplug emulation
>  5. Logical Memory hot-add phase
>    5.1. State of memory
>    5.2. How to online memory
> @@ -215,6 +216,29 @@
>  Please see "How to online memory" in this text.
>  
>  
> +4.3 Node hotplug emulation
> +------------
> +With debugfs, it is possible to test node hotplug by assigning the newly
> +added memory to a new node id when using a different interface with a similar
> +behavior to "probe" described in section 4.2.  If a node id is possible
> +(there are bits in /sys/devices/system/memory/possible that are not online),
> +then it may be used to emulate a newly added node as the result of memory
> +hotplug by using the debugfs "add_node" interface.
> +
> +The add_node interface is located at "mem_hotplug/add_node" at the debugfs
> +mount point.
> +
> +You can create a new node of a specified size starting at the physical
> +address of new memory by
> +
> +% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
> +
> +Where "size" can be represented in megabytes or gigabytes (for example,
> +"128M" or "1G").  The minumum size is that of a memory section.
> +
> +Once the new node has been added, it is possible to online the memory by
> +toggling the "state" of its memory section(s) as described in section 5.1.
> +
>  
>  ------------------------------
>  5. Logical Memory hot-add phase
> Index: linux-hpe4/mm/memory_hotplug.c
> ===================================================================
> --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> @@ -924,3 +924,63 @@
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  EXPORT_SYMBOL_GPL(remove_memory);
> +
> +#ifdef CONFIG_DEBUG_FS
> +#include <linux/debugfs.h>
> +
> +static struct dentry *memhp_debug_root;
> +
> +static ssize_t add_node_store(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	nodemask_t mask;

NODEMASK_ALLOC()?

> +	u64 start, size;
> +	char buffer[64];
> +	char *p;
> +	int nid;
> +	int ret;
> +
> +	memset(buffer, 0, sizeof(buffer));
> +	if (count > sizeof(buffer) - 1)
> +		count = sizeof(buffer) - 1;

This will cause the write to return a smaller number than `count': a
short write.  Some userspace code may then decide to write the
remainder of the data (whcih is the correct way to use the write()
syscall).

Could be a bit dangerous, and perhaps simply declaring an error if too
much data was written would be a better approach.

> +	if (copy_from_user(buffer, buf, count))
> +		return -EFAULT;
> +
> +	size = memparse(buffer, &p);
> +	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))

PAGES_PER_SECTION has type unsigned long, so the rhs of this comparison
might overflow on 32-bit, should anyone ever try to use this code on
32-bit.

otoh the compiler might do it as 64-bit because the lhs is 64-bit.  Not
sure.

> +		return -EINVAL;
> +	if (*p != '@')
> +		return -EINVAL;
> +
> +	start = simple_strtoull(p + 1, NULL, 0);

You disagreed with checkpatch?

> +	nodes_andnot(mask, node_possible_map, node_online_map);
> +	nid = first_node(mask);
> +	if (nid == MAX_NUMNODES)
> +		return -ENOMEM;
> +
> +	ret = add_memory(nid, start, size);
> +	return ret ? ret : count;
> +}
> +
> +static const struct file_operations add_node_file_ops = {
> +	.write		= add_node_store,
> +	.llseek		= generic_file_llseek,
> +};
> +
> +static int __init node_debug_init(void)
> +{
> +	if (!memhp_debug_root)
> +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> +	if (!memhp_debug_root)
> +		return -ENOMEM;
> +
> +	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
> +			NULL, &add_node_file_ops))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +module_init(node_debug_init);
> +#endif /* CONFIG_DEBUG_FS */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-10  7:31   ` shaohui.zheng
@ 2010-12-23  0:27     ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Fri, 10 Dec 2010 15:31:24 +0800
shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
> was already supported in current linux kernel. NUMA Hotplug Emulator provides
> a mechanism to emulate the process with software method. It can be used for
> testing or debuging purpose.
> 
> CPU physical hotplug is different with logical CPU online/offline. Logical
> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
> hotplug emulator uses probe/release interface. It becomes possible to do cpu
> hotplug automation and stress
> 
> Add cpu interface probe/release under sysfs for x86_64. User can use this
> interface to emulate the cpu hot-add and hot-remove process.
> 
> Directive:
> *) Reserve CPU thru grub parameter like:
> 	maxcpus=4
> 
> the rest CPUs will not be initiliazed. 
> 
> *) Probe CPU
> we can use the probe interface to hot-add new CPUs:
> 	echo nid > /sys/devices/system/cpu/probe
> 
> *) Release a CPU
> 	echo cpu > /sys/devices/system/cpu/release
> 
> A reserved CPU will be hot-added to the specified node.
> 1) nid == 0, the CPU will be added to the real node which the CPU
> should be in
> 2) nid != 0, add the CPU to node nid even through it is a fake node.
> 
>
> ...
>
> --- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-12-10 14:39:43.333331000 +0800
> +++ linux-hpe4/arch/x86/kernel/topology.c	2010-12-10 14:49:56.043331000 +0800
> @@ -30,6 +30,9 @@
>  #include <linux/init.h>
>  #include <linux/smp.h>
>  #include <asm/cpu.h>
> +#include <linux/cpu.h>
> +#include <linux/topology.h>
> +#include <linux/acpi.h>
>  
>  static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
>  
> @@ -66,6 +69,78 @@
>  	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
>  }
>  EXPORT_SYMBOL(arch_unregister_cpu);
> +
> +ssize_t arch_cpu_probe(const char *buf, size_t count)
> +{
> +	int nid = 0;
> +	int num = 0, selected = 0;

One definition per line make for more maintainable code.

Two of these initialisations are unnecessary.

> +	/* check parameters */
> +	if (!buf || count < 2)
> +		return -EPERM;
> +
> +	nid = simple_strtoul(buf, NULL, 0);

checkpatch?

> +	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);

"Add a CPU to node %d" would make more sense.

> +	if (nid < 0 || nid > nr_node_ids - 1) {
> +		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
> +			nid, nr_node_ids);
> +		return -EPERM;
> +	}
> +
> +	if (!node_online(nid)) {
> +		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);

"giving"

> +		return -EPERM;
> +	}
> +
> +	/* find first uninitialized cpu */
> +	for_each_present_cpu(num) {

s/num/cpu/ would be conventional.  "num" is a pretty poor identifier in
general - it fails to identify what it is counting.

> +		if (per_cpu(cpu_sys_devices, num) == NULL) {
> +			selected = num;

Similarly, I'd have used "selected_cpu".

> +			break;
> +		}
> +	}
> +
> +	if (selected >= num_possible_cpus()) {
> +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> +		return -EPERM;
> +	}
> +
> +	/* register cpu */
> +	arch_register_cpu_node(selected, nid);
> +	acpi_map_lsapic_emu(selected, nid);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_probe);

arch_cpu_probe() is global and exported to modules, but is undocumented.

If it had been documented, I might have been able to work out why arg
`count' is checked, but never used.

> +ssize_t arch_cpu_release(const char *buf, size_t count)
> +{
> +	int cpu = 0;
> +
> +	cpu =  simple_strtoul(buf, NULL, 0);

unneeded initialisation, spurious whitespace, checkpatch.

> +	/* cpu 0 is not hotplugable */
> +	if (cpu == 0) {
> +		printk(KERN_ERR "can not release cpu 0.\n");

It's generally better to make kernel messages self-identifying. 
Especially error messages.  If someone comes along and sees "can not
release cpu 0" in their logs, they don't have a clue what caused it
unless they download the kernel sources and go grepping.

> +		return -EPERM;
> +	}
> +
> +	if (cpu_online(cpu)) {
> +		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
> +		if (!cpu_down(cpu)) {
> +			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);

"failed", "giving".

> +			return -EPERM;
> +		}
> +
> +	}
> +
> +	arch_unregister_cpu(cpu);
> +	acpi_unmap_lsapic(cpu);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_release);

No documentation.

>  #else /* CONFIG_HOTPLUG_CPU */
>  
>  static int __init arch_register_cpu(int num)
> @@ -83,8 +158,14 @@
>  		register_one_node(i);
>  #endif
>  
> -	for_each_present_cpu(i)
> -		arch_register_cpu(i);
> +	/*
> +	 * when cpu hotplug emulation enabled, register the online cpu only,
> +	 * the rests are reserved for cpu probe.
> +	 */

Something like "When cpu hotplug emulation is enabled, register only
the online cpu.  The remainder are reserved for cpu probing.".


> +	for_each_present_cpu(i) {
> +		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
> +			arch_register_cpu(i);
> +	}
>  
>  	return 0;
>  }
>
> ...
>
> --- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-12-10 13:42:34.593331000 +0800
> +++ linux-hpe4/drivers/acpi/processor_driver.c	2010-12-10 14:48:32.143331001 +0800
> @@ -542,6 +542,14 @@
>  		goto err_free_cpumask;
>  
>  	sysdev = get_cpu_sysdev(pr->id);
> +	/*
> +	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
> +	 * throu the cpu probe interface. Return directly.

s/emulation, the/emulation.  The/
s/throu/through/

> +	 */
> +	if (sysdev == NULL) {
> +		goto out;
> +	}

Unneeded braces.

>  	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
>  		result = -EFAULT;
>  		goto err_remove_fs;
> @@ -582,6 +590,7 @@
>  		goto err_remove_sysfs;
>  	}
>  
> +out:
>  	return 0;
>  
>
> ...
>
> --- linux-hpe4.orig/drivers/base/cpu.c	2010-12-10 14:39:43.333331000 +0800
> +++ linux-hpe4/drivers/base/cpu.c	2010-12-10 14:48:32.143331001 +0800
> @@ -22,9 +22,15 @@
>  };
>  EXPORT_SYMBOL(cpu_sysdev_class);
>  
> -static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +/*
> + * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is

s/it/It/.

> + * disabled in default, we can enable it throu grub parameter cpu_hpe=on

"through".

> + */
> +int cpu_hpe_on;

__read_mostly, perhaps.

>  static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
>  			   char *buf)
>  {
>
> ...
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-23  0:27     ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Fri, 10 Dec 2010 15:31:24 +0800
shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
> was already supported in current linux kernel. NUMA Hotplug Emulator provides
> a mechanism to emulate the process with software method. It can be used for
> testing or debuging purpose.
> 
> CPU physical hotplug is different with logical CPU online/offline. Logical
> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
> hotplug emulator uses probe/release interface. It becomes possible to do cpu
> hotplug automation and stress
> 
> Add cpu interface probe/release under sysfs for x86_64. User can use this
> interface to emulate the cpu hot-add and hot-remove process.
> 
> Directive:
> *) Reserve CPU thru grub parameter like:
> 	maxcpus=4
> 
> the rest CPUs will not be initiliazed. 
> 
> *) Probe CPU
> we can use the probe interface to hot-add new CPUs:
> 	echo nid > /sys/devices/system/cpu/probe
> 
> *) Release a CPU
> 	echo cpu > /sys/devices/system/cpu/release
> 
> A reserved CPU will be hot-added to the specified node.
> 1) nid == 0, the CPU will be added to the real node which the CPU
> should be in
> 2) nid != 0, add the CPU to node nid even through it is a fake node.
> 
>
> ...
>
> --- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-12-10 14:39:43.333331000 +0800
> +++ linux-hpe4/arch/x86/kernel/topology.c	2010-12-10 14:49:56.043331000 +0800
> @@ -30,6 +30,9 @@
>  #include <linux/init.h>
>  #include <linux/smp.h>
>  #include <asm/cpu.h>
> +#include <linux/cpu.h>
> +#include <linux/topology.h>
> +#include <linux/acpi.h>
>  
>  static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
>  
> @@ -66,6 +69,78 @@
>  	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
>  }
>  EXPORT_SYMBOL(arch_unregister_cpu);
> +
> +ssize_t arch_cpu_probe(const char *buf, size_t count)
> +{
> +	int nid = 0;
> +	int num = 0, selected = 0;

One definition per line make for more maintainable code.

Two of these initialisations are unnecessary.

> +	/* check parameters */
> +	if (!buf || count < 2)
> +		return -EPERM;
> +
> +	nid = simple_strtoul(buf, NULL, 0);

checkpatch?

> +	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);

"Add a CPU to node %d" would make more sense.

> +	if (nid < 0 || nid > nr_node_ids - 1) {
> +		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
> +			nid, nr_node_ids);
> +		return -EPERM;
> +	}
> +
> +	if (!node_online(nid)) {
> +		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);

"giving"

> +		return -EPERM;
> +	}
> +
> +	/* find first uninitialized cpu */
> +	for_each_present_cpu(num) {

s/num/cpu/ would be conventional.  "num" is a pretty poor identifier in
general - it fails to identify what it is counting.

> +		if (per_cpu(cpu_sys_devices, num) == NULL) {
> +			selected = num;

Similarly, I'd have used "selected_cpu".

> +			break;
> +		}
> +	}
> +
> +	if (selected >= num_possible_cpus()) {
> +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> +		return -EPERM;
> +	}
> +
> +	/* register cpu */
> +	arch_register_cpu_node(selected, nid);
> +	acpi_map_lsapic_emu(selected, nid);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_probe);

arch_cpu_probe() is global and exported to modules, but is undocumented.

If it had been documented, I might have been able to work out why arg
`count' is checked, but never used.

> +ssize_t arch_cpu_release(const char *buf, size_t count)
> +{
> +	int cpu = 0;
> +
> +	cpu =  simple_strtoul(buf, NULL, 0);

unneeded initialisation, spurious whitespace, checkpatch.

> +	/* cpu 0 is not hotplugable */
> +	if (cpu == 0) {
> +		printk(KERN_ERR "can not release cpu 0.\n");

It's generally better to make kernel messages self-identifying. 
Especially error messages.  If someone comes along and sees "can not
release cpu 0" in their logs, they don't have a clue what caused it
unless they download the kernel sources and go grepping.

> +		return -EPERM;
> +	}
> +
> +	if (cpu_online(cpu)) {
> +		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
> +		if (!cpu_down(cpu)) {
> +			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);

"failed", "giving".

> +			return -EPERM;
> +		}
> +
> +	}
> +
> +	arch_unregister_cpu(cpu);
> +	acpi_unmap_lsapic(cpu);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_release);

No documentation.

>  #else /* CONFIG_HOTPLUG_CPU */
>  
>  static int __init arch_register_cpu(int num)
> @@ -83,8 +158,14 @@
>  		register_one_node(i);
>  #endif
>  
> -	for_each_present_cpu(i)
> -		arch_register_cpu(i);
> +	/*
> +	 * when cpu hotplug emulation enabled, register the online cpu only,
> +	 * the rests are reserved for cpu probe.
> +	 */

Something like "When cpu hotplug emulation is enabled, register only
the online cpu.  The remainder are reserved for cpu probing.".


> +	for_each_present_cpu(i) {
> +		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
> +			arch_register_cpu(i);
> +	}
>  
>  	return 0;
>  }
>
> ...
>
> --- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-12-10 13:42:34.593331000 +0800
> +++ linux-hpe4/drivers/acpi/processor_driver.c	2010-12-10 14:48:32.143331001 +0800
> @@ -542,6 +542,14 @@
>  		goto err_free_cpumask;
>  
>  	sysdev = get_cpu_sysdev(pr->id);
> +	/*
> +	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
> +	 * throu the cpu probe interface. Return directly.

s/emulation, the/emulation.  The/
s/throu/through/

> +	 */
> +	if (sysdev == NULL) {
> +		goto out;
> +	}

Unneeded braces.

>  	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
>  		result = -EFAULT;
>  		goto err_remove_fs;
> @@ -582,6 +590,7 @@
>  		goto err_remove_sysfs;
>  	}
>  
> +out:
>  	return 0;
>  
>
> ...
>
> --- linux-hpe4.orig/drivers/base/cpu.c	2010-12-10 14:39:43.333331000 +0800
> +++ linux-hpe4/drivers/base/cpu.c	2010-12-10 14:48:32.143331001 +0800
> @@ -22,9 +22,15 @@
>  };
>  EXPORT_SYMBOL(cpu_sysdev_class);
>  
> -static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +/*
> + * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is

s/it/It/.

> + * disabled in default, we can enable it throu grub parameter cpu_hpe=on

"through".

> + */
> +int cpu_hpe_on;

__read_mostly, perhaps.

>  static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
>  			   char *buf)
>  {
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [6/7, v9] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
  2010-12-10  7:31   ` shaohui.zheng
@ 2010-12-23  0:27     ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Sam Ravnborg, Haicheng Li

On Fri, 10 Dec 2010 15:31:25 +0800
shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> When hotplug a CPU with emulator, we are using a logical CPU to emulate the
> CPU hotplug process. For the CPU supported SMT, some logical CPUs are in the
> same socket, but it may located in different NUMA node after we have emulator.
> it misleads the scheduling domain to build the incorrect hierarchy, and it
> causes the following call trace when rebalance the scheduling domain:
> 
> divide error: 0000 [#1] SMP 
> last sysfs file: /sys/devices/system/cpu/cpu8/online
> CPU 0 
> Modules linked in: fbcon tileblit font bitblit softcursor radeon ttm drm_kms_helper e1000e usbhid via_rhine mii drm i2c_algo_bit igb dca
> Pid: 0, comm: swapper Not tainted 2.6.32hpe #78 X8DTN
> RIP: 0010:[<ffffffff81051da5>]  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
> RSP: 0018:ffff880028203c30  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000015ac0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff880277e8cfa0 RDI: 0000000000000000
> RBP: ffff880028203dc0 R08: ffff880277e8cfa0 R09: 0000000000000040
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00007f16cfc85770 CR3: 0000000001001000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff81822000, task ffffffff8184a600)
> Stack:
>  ffff880028203d60 ffff880028203cd0 ffff8801c204ff08 ffff880028203e38
> <0> 0101ffff81018c59 ffff880028203e44 00000001810806bd ffff8801c204fe00
> <0> 0000000528200000 ffffffff00000000 0000000000000018 0000000000015ac0
> Call Trace:
>  <IRQ> 
>  [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
>  [<ffffffff81053b2c>] rebalance_domains+0x17c/0x570
>  [<ffffffff81018c89>] ? read_tsc+0x9/0x20
>  [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
>  [<ffffffff810569ed>] run_rebalance_domains+0xbd/0xf0
>  [<ffffffff8106471f>] __do_softirq+0xaf/0x1e0
>  [<ffffffff810b7d18>] ? handle_IRQ_event+0x58/0x160
>  [<ffffffff810130ac>] call_softirq+0x1c/0x30
>  [<ffffffff81014a85>] do_softirq+0x65/0xa0
>  [<ffffffff810645cd>] irq_exit+0x7d/0x90
>  [<ffffffff81013ff0>] do_IRQ+0x70/0xe0
>  [<ffffffff810128d3>] ret_from_intr+0x0/0x11
>  <EOI> 
>  [<ffffffff8133387f>] ? acpi_idle_enter_bm+0x281/0x2b5
>  [<ffffffff81333878>] ? acpi_idle_enter_bm+0x27a/0x2b5
>  [<ffffffff8145dc8f>] ? cpuidle_idle_call+0x9f/0x130
>  [<ffffffff81010e2b>] ? cpu_idle+0xab/0x100
>  [<ffffffff8158aee6>] ? rest_init+0x66/0x70
>  [<ffffffff81905d90>] ? start_kernel+0x3e3/0x3ef
>  [<ffffffff8190533a>] ? x86_64_start_reservations+0x125/0x129
>  [<ffffffff81905438>] ? x86_64_start_kernel+0xfa/0x109
> Code: 00 00 e9 4c fb ff ff 0f 1f 80 00 00 00 00 48 8b b5 d8 fe ff ff 48 8b 45 a8 4d 29 ef 8b 56 08 48 c1 e0 0a 49 89 f0 48 89 d7 31 d2 <48> f7 f7 31 d2 48 89 45 a0 8b 76 08 4c 89 f0 48 c1 e0 0a 48 f7 
> RIP  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
>  RSP <ffff880028203c30>
> 
> Solution:
> 
> We put the logical CPU into a fake CPU socket, and assign it an unique
>  phys_proc_id. For the fake socket, we put one logical CPU in only. This
> method fixes the above bug.
> 
>
> ...
>
> --- linux-hpe4.orig/arch/x86/include/asm/processor.h	2010-11-17 09:00:51.354100239 +0800
> +++ linux-hpe4/arch/x86/include/asm/processor.h	2010-11-17 09:01:10.222837594 +0800
> @@ -113,6 +113,15 @@
>  	/* Index into per_cpu list: */
>  	u16			cpu_index;
>  #endif
> +
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +	/*
> +	 * Use a logic cpu to emulate a physical cpu's hotplug. We put the

"logical".

> +	 * logical cpu into a fake socket, assign a fake physical id to it,
> +	 * and create a fake core.
> +	 */
> +	__u8		cpu_probe_on; /* A flag to enable cpu probe/release */
> +#endif
>  } __attribute__((__aligned__(SMP_CACHE_BYTES)));
>  
>  #define X86_VENDOR_INTEL	0
> Index: linux-hpe4/arch/x86/kernel/smpboot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.202837209 +0800
> +++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.222837594 +0800
> @@ -97,6 +97,7 @@
>   */
>  static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>  void cpu_hotplug_driver_lock()
>  {
>          mutex_lock(&x86_cpu_hotplug_driver_mutex);
> @@ -106,6 +107,7 @@
>  {
>          mutex_unlock(&x86_cpu_hotplug_driver_mutex);
>  }
> +#endif
>  
>  #else
>  static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
> @@ -198,6 +200,8 @@
>  {
>  	int cpuid, phys_id;
>  	unsigned long timeout;
> +	u8 cpu_probe_on = 0;

Unneeded initialisation.

Does this cause an unused var warning when
CONFIG_ARCH_CPU_PROBE_RELEASE=n?

> +	struct cpuinfo_x86 *c;
>  
>  	/*
>  	 * If waken up by an INIT in an 82489DX configuration
>
> ...
>
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +/*
> + * Put the logical cpu into a new sokect, and encapsule it into core 0.

That comment needs help.

> + */
> +static void fake_cpu_socket_info(int cpu)
> +{
> +	struct cpuinfo_x86 *c = &cpu_data(cpu);
> +	int i, phys_id = 0;
> +
> +	/* calculate the max phys_id */
> +	for_each_present_cpu(i) {
> +		struct cpuinfo_x86 *c = &cpu_data(i);
> +		if (phys_id < c->phys_proc_id)
> +			phys_id = c->phys_proc_id;
> +	}
> +
> +	c->phys_proc_id = phys_id + 1; /* pick up a unused phys_proc_id */
> +	c->cpu_core_id = 0; /* always put the logical cpu to core 0 */
> +	c->cpu_probe_on = 1;
> +}
> +
>
> ...
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [6/7, v9] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
@ 2010-12-23  0:27     ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Sam Ravnborg, Haicheng Li

On Fri, 10 Dec 2010 15:31:25 +0800
shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> When hotplug a CPU with emulator, we are using a logical CPU to emulate the
> CPU hotplug process. For the CPU supported SMT, some logical CPUs are in the
> same socket, but it may located in different NUMA node after we have emulator.
> it misleads the scheduling domain to build the incorrect hierarchy, and it
> causes the following call trace when rebalance the scheduling domain:
> 
> divide error: 0000 [#1] SMP 
> last sysfs file: /sys/devices/system/cpu/cpu8/online
> CPU 0 
> Modules linked in: fbcon tileblit font bitblit softcursor radeon ttm drm_kms_helper e1000e usbhid via_rhine mii drm i2c_algo_bit igb dca
> Pid: 0, comm: swapper Not tainted 2.6.32hpe #78 X8DTN
> RIP: 0010:[<ffffffff81051da5>]  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
> RSP: 0018:ffff880028203c30  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000015ac0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff880277e8cfa0 RDI: 0000000000000000
> RBP: ffff880028203dc0 R08: ffff880277e8cfa0 R09: 0000000000000040
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00007f16cfc85770 CR3: 0000000001001000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff81822000, task ffffffff8184a600)
> Stack:
>  ffff880028203d60 ffff880028203cd0 ffff8801c204ff08 ffff880028203e38
> <0> 0101ffff81018c59 ffff880028203e44 00000001810806bd ffff8801c204fe00
> <0> 0000000528200000 ffffffff00000000 0000000000000018 0000000000015ac0
> Call Trace:
>  <IRQ> 
>  [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
>  [<ffffffff81053b2c>] rebalance_domains+0x17c/0x570
>  [<ffffffff81018c89>] ? read_tsc+0x9/0x20
>  [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
>  [<ffffffff810569ed>] run_rebalance_domains+0xbd/0xf0
>  [<ffffffff8106471f>] __do_softirq+0xaf/0x1e0
>  [<ffffffff810b7d18>] ? handle_IRQ_event+0x58/0x160
>  [<ffffffff810130ac>] call_softirq+0x1c/0x30
>  [<ffffffff81014a85>] do_softirq+0x65/0xa0
>  [<ffffffff810645cd>] irq_exit+0x7d/0x90
>  [<ffffffff81013ff0>] do_IRQ+0x70/0xe0
>  [<ffffffff810128d3>] ret_from_intr+0x0/0x11
>  <EOI> 
>  [<ffffffff8133387f>] ? acpi_idle_enter_bm+0x281/0x2b5
>  [<ffffffff81333878>] ? acpi_idle_enter_bm+0x27a/0x2b5
>  [<ffffffff8145dc8f>] ? cpuidle_idle_call+0x9f/0x130
>  [<ffffffff81010e2b>] ? cpu_idle+0xab/0x100
>  [<ffffffff8158aee6>] ? rest_init+0x66/0x70
>  [<ffffffff81905d90>] ? start_kernel+0x3e3/0x3ef
>  [<ffffffff8190533a>] ? x86_64_start_reservations+0x125/0x129
>  [<ffffffff81905438>] ? x86_64_start_kernel+0xfa/0x109
> Code: 00 00 e9 4c fb ff ff 0f 1f 80 00 00 00 00 48 8b b5 d8 fe ff ff 48 8b 45 a8 4d 29 ef 8b 56 08 48 c1 e0 0a 49 89 f0 48 89 d7 31 d2 <48> f7 f7 31 d2 48 89 45 a0 8b 76 08 4c 89 f0 48 c1 e0 0a 48 f7 
> RIP  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
>  RSP <ffff880028203c30>
> 
> Solution:
> 
> We put the logical CPU into a fake CPU socket, and assign it an unique
>  phys_proc_id. For the fake socket, we put one logical CPU in only. This
> method fixes the above bug.
> 
>
> ...
>
> --- linux-hpe4.orig/arch/x86/include/asm/processor.h	2010-11-17 09:00:51.354100239 +0800
> +++ linux-hpe4/arch/x86/include/asm/processor.h	2010-11-17 09:01:10.222837594 +0800
> @@ -113,6 +113,15 @@
>  	/* Index into per_cpu list: */
>  	u16			cpu_index;
>  #endif
> +
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +	/*
> +	 * Use a logic cpu to emulate a physical cpu's hotplug. We put the

"logical".

> +	 * logical cpu into a fake socket, assign a fake physical id to it,
> +	 * and create a fake core.
> +	 */
> +	__u8		cpu_probe_on; /* A flag to enable cpu probe/release */
> +#endif
>  } __attribute__((__aligned__(SMP_CACHE_BYTES)));
>  
>  #define X86_VENDOR_INTEL	0
> Index: linux-hpe4/arch/x86/kernel/smpboot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.202837209 +0800
> +++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.222837594 +0800
> @@ -97,6 +97,7 @@
>   */
>  static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>  void cpu_hotplug_driver_lock()
>  {
>          mutex_lock(&x86_cpu_hotplug_driver_mutex);
> @@ -106,6 +107,7 @@
>  {
>          mutex_unlock(&x86_cpu_hotplug_driver_mutex);
>  }
> +#endif
>  
>  #else
>  static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
> @@ -198,6 +200,8 @@
>  {
>  	int cpuid, phys_id;
>  	unsigned long timeout;
> +	u8 cpu_probe_on = 0;

Unneeded initialisation.

Does this cause an unused var warning when
CONFIG_ARCH_CPU_PROBE_RELEASE=n?

> +	struct cpuinfo_x86 *c;
>  
>  	/*
>  	 * If waken up by an INIT in an 82489DX configuration
>
> ...
>
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +/*
> + * Put the logical cpu into a new sokect, and encapsule it into core 0.

That comment needs help.

> + */
> +static void fake_cpu_socket_info(int cpu)
> +{
> +	struct cpuinfo_x86 *c = &cpu_data(cpu);
> +	int i, phys_id = 0;
> +
> +	/* calculate the max phys_id */
> +	for_each_present_cpu(i) {
> +		struct cpuinfo_x86 *c = &cpu_data(i);
> +		if (phys_id < c->phys_proc_id)
> +			phys_id = c->phys_proc_id;
> +	}
> +
> +	c->phys_proc_id = phys_id + 1; /* pick up a unused phys_proc_id */
> +	c->cpu_core_id = 0; /* always put the logical cpu to core 0 */
> +	c->cpu_probe_on = 1;
> +}
> +
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [7/7, v9] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
  2010-12-10  7:31   ` shaohui.zheng
@ 2010-12-23  0:27     ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Haicheng Li

On Fri, 10 Dec 2010 15:31:26 +0800
shaohui.zheng@intel.com wrote:

> From:  Shaohui Zheng <shaohui.zheng@intel.com>
> 
> Add add_memory interface to support to memory hotplug emulation for each online
> node under debugfs. The reserved memory can be added into desired node with
> this interface.
> 
> The layout on debugfs:
> 	mem_hotplug/node0/add_memory
> 	mem_hotplug/node1/add_memory
> 	mem_hotplug/node2/add_memory
> 	...
> 
> Add a memory section(128M) to node 3(boots with mem=1024m)
> 
> 	echo 0x40000000 > mem_hotplug/node3/add_memory
> 
>
> ...
>
> +#ifdef CONFIG_ARCH_MEMORY_PROBE
> +
> +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	u64 phys_addr = 0;

Even more unneeded initalisation.

Please check the whole patchset for this.  It's bad because it can
sometimes generate more code and because it can sometimes hide bugs by
suppressing used-uninitialsied warnings.

> +	int nid = file->private_data - NULL;

Well that was sneaky.

It would be more conventional to just use the typecast:

	int nid = (long)file->private_data;


> +	int ret;
> +
> +	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> +	phys_addr = simple_strtoull(buf, NULL, 0);

checkpatch

> +	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
> +	if (ret)
> +		count = ret;
> +
> +	return count;
> +}
> +
> +static int add_memory_open(struct inode *inode, struct file *file)
> +{
> +	file->private_data = inode->i_private;

Was this usage of i_private and private_data documented in comments
somewhere?

> +	return 0;
> +}
> +
> +static const struct file_operations add_memory_file_ops = {
> +	.open		= add_memory_open,
> +	.write		= add_memory_store,
> +	.llseek		= generic_file_llseek,
> +};
> +
> +/*
> + * Create add_memory debugfs entry under specified node
> + */
> +static int debugfs_create_add_memory_entry(int nid)
> +{
> +	char buf[32];
> +	static struct dentry *node_debug_root;
> +
> +	snprintf(buf, sizeof(buf), "node%d", nid);
> +	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
> +	if (!node_debug_root)
> +		return -ENOMEM;

hm, debugfs_create_dir() was poorly designed - it should return an
ERR_PTR() so callers don't need to assume ENOMEM, which may be incorrect.

> +	/* the nid information was represented by the offset of pointer(NULL+nid) */
> +	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> +			NULL + nid, &add_memory_file_ops))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int __init memory_debug_init(void)
> +{
> +	int nid;
> +
> +	if (!memhp_debug_root)
> +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> +	if (!memhp_debug_root)
> +		return -ENOMEM;
> +
> +	for_each_online_node(nid)
> +		 debugfs_create_add_memory_entry(nid);
> +
> +	return 0;
> +}
> +
> +module_init(memory_debug_init);
> +#else
> +static debugfs_create_add_memory_entry(int nid)

"static int".

> +{
> +	return 0;
> +}
> +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> +
>  static ssize_t add_node_store(struct file *file, const char __user *buf,
>  				size_t count, loff_t *ppos)
>  {
> @@ -963,6 +1038,8 @@
>  		return -ENOMEM;
>  
>  	ret = add_memory(nid, start, size);
> +
> +	debugfs_create_add_memory_entry(nid);
>  	return ret ? ret : count;
>  }
>  
>
> ...
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [7/7, v9] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
@ 2010-12-23  0:27     ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  0:27 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, ak, shaohui.zheng,
	rientjes, dave, gregkh, Haicheng Li

On Fri, 10 Dec 2010 15:31:26 +0800
shaohui.zheng@intel.com wrote:

> From:  Shaohui Zheng <shaohui.zheng@intel.com>
> 
> Add add_memory interface to support to memory hotplug emulation for each online
> node under debugfs. The reserved memory can be added into desired node with
> this interface.
> 
> The layout on debugfs:
> 	mem_hotplug/node0/add_memory
> 	mem_hotplug/node1/add_memory
> 	mem_hotplug/node2/add_memory
> 	...
> 
> Add a memory section(128M) to node 3(boots with mem=1024m)
> 
> 	echo 0x40000000 > mem_hotplug/node3/add_memory
> 
>
> ...
>
> +#ifdef CONFIG_ARCH_MEMORY_PROBE
> +
> +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	u64 phys_addr = 0;

Even more unneeded initalisation.

Please check the whole patchset for this.  It's bad because it can
sometimes generate more code and because it can sometimes hide bugs by
suppressing used-uninitialsied warnings.

> +	int nid = file->private_data - NULL;

Well that was sneaky.

It would be more conventional to just use the typecast:

	int nid = (long)file->private_data;


> +	int ret;
> +
> +	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> +	phys_addr = simple_strtoull(buf, NULL, 0);

checkpatch

> +	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
> +	if (ret)
> +		count = ret;
> +
> +	return count;
> +}
> +
> +static int add_memory_open(struct inode *inode, struct file *file)
> +{
> +	file->private_data = inode->i_private;

Was this usage of i_private and private_data documented in comments
somewhere?

> +	return 0;
> +}
> +
> +static const struct file_operations add_memory_file_ops = {
> +	.open		= add_memory_open,
> +	.write		= add_memory_store,
> +	.llseek		= generic_file_llseek,
> +};
> +
> +/*
> + * Create add_memory debugfs entry under specified node
> + */
> +static int debugfs_create_add_memory_entry(int nid)
> +{
> +	char buf[32];
> +	static struct dentry *node_debug_root;
> +
> +	snprintf(buf, sizeof(buf), "node%d", nid);
> +	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
> +	if (!node_debug_root)
> +		return -ENOMEM;

hm, debugfs_create_dir() was poorly designed - it should return an
ERR_PTR() so callers don't need to assume ENOMEM, which may be incorrect.

> +	/* the nid information was represented by the offset of pointer(NULL+nid) */
> +	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> +			NULL + nid, &add_memory_file_ops))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int __init memory_debug_init(void)
> +{
> +	int nid;
> +
> +	if (!memhp_debug_root)
> +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> +	if (!memhp_debug_root)
> +		return -ENOMEM;
> +
> +	for_each_online_node(nid)
> +		 debugfs_create_add_memory_entry(nid);
> +
> +	return 0;
> +}
> +
> +module_init(memory_debug_init);
> +#else
> +static debugfs_create_add_memory_entry(int nid)

"static int".

> +{
> +	return 0;
> +}
> +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> +
>  static ssize_t add_node_store(struct file *file, const char __user *buf,
>  				size_t count, loff_t *ppos)
>  {
> @@ -963,6 +1038,8 @@
>  		return -ENOMEM;
>  
>  	ret = add_memory(nid, start, size);
> +
> +	debugfs_create_add_memory_entry(nid);
>  	return ret ? ret : count;
>  }
>  
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [2/7, v9] NUMA Hotplug Emulator: Add numa=possible option
  2010-12-23  0:27     ` Andrew Morton
@ 2010-12-23  1:14       ` David Rientjes
  -1 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-23  1:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Haicheng Li, lethal, Andi Kleen,
	shaohui.zheng, dave, Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1413 bytes --]

On Wed, 22 Dec 2010, Andrew Morton wrote:

> > @@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
> >  		numa_set_node(i, 0);
> >  	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
> >  	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
> > +out: __maybe_unused
> 
> hm, I didn't know you could do that with labels.
> 
> Does it work?
> 

Yeah, it's equivalent to __attribute__((unused)) and according to the gcc 
manual section 6.30:

	In GNU C, an attribute specifier list may appear after the colon 
	following a label, other than a case or default label. The only 
	attribute it makes sense to use after a label is unused. This 
	feature is intended for code generated by programs which contains 
	labels that may be unused but which is compiled with ‘-Wall’. It 
	would not normally be appropriate to use in it human-written code, 
	though it could be useful in cases where the code that jumps to 
	the label is contained within an #ifdef conditional.

I used it because I knew I wouldn't get away with putting a label inside 
an #ifdef :)

> > +	for (i = 0; i < numa_possible_nodes; i++) {
> > +		int nid;
> > +
> > +		nid = first_unset_node(node_possible_map);
> > +		if (nid == MAX_NUMNODES)
> > +			break;
> > +		node_set(nid, node_possible_map);
> > +	}
> >  }
> >  
> >  unsigned long __init numa_free_all_bootmem(void)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [2/7, v9] NUMA Hotplug Emulator: Add numa=possible option
@ 2010-12-23  1:14       ` David Rientjes
  0 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-23  1:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Haicheng Li, lethal, Andi Kleen,
	shaohui.zheng, dave, Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1413 bytes --]

On Wed, 22 Dec 2010, Andrew Morton wrote:

> > @@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
> >  		numa_set_node(i, 0);
> >  	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
> >  	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
> > +out: __maybe_unused
> 
> hm, I didn't know you could do that with labels.
> 
> Does it work?
> 

Yeah, it's equivalent to __attribute__((unused)) and according to the gcc 
manual section 6.30:

	In GNU C, an attribute specifier list may appear after the colon 
	following a label, other than a case or default label. The only 
	attribute it makes sense to use after a label is unused. This 
	feature is intended for code generated by programs which contains 
	labels that may be unused but which is compiled with a??-Walla??. It 
	would not normally be appropriate to use in it human-written code, 
	though it could be useful in cases where the code that jumps to 
	the label is contained within an #ifdef conditional.

I used it because I knew I wouldn't get away with putting a label inside 
an #ifdef :)

> > +	for (i = 0; i < numa_possible_nodes; i++) {
> > +		int nid;
> > +
> > +		nid = first_unset_node(node_possible_map);
> > +		if (nid == MAX_NUMNODES)
> > +			break;
> > +		node_set(nid, node_possible_map);
> > +	}
> >  }
> >  
> >  unsigned long __init numa_free_all_bootmem(void)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-23  0:27     ` Andrew Morton
@ 2010-12-23  1:34       ` Shaohui Zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  1:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Wed, Dec 22, 2010 at 04:27:27PM -0800, Andrew Morton wrote:
> On Fri, 10 Dec 2010 15:31:24 +0800
> > +
> > +ssize_t arch_cpu_probe(const char *buf, size_t count)
> > +{
> > +	int nid = 0;
> > +	int num = 0, selected = 0;
> 
> One definition per line make for more maintainable code.
> 
> Two of these initialisations are unnecessary.
> 
Agree, I will put them into 2 lines, and remove the initialisations.
I always try to initialize them when we define it, it seems that it is a bad habit.

> > +	/* check parameters */
> > +	if (!buf || count < 2)
> > +		return -EPERM;
> > +
> > +	nid = simple_strtoul(buf, NULL, 0);
> 
> checkpatch?

it is a warning, so I ignore it.
I will solve it.

> 
> > +	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
> 
> "Add a CPU to node %d" would make more sense.
> 

Get it.

> > +	if (nid < 0 || nid > nr_node_ids - 1) {
> > +		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
> > +			nid, nr_node_ids);
> > +		return -EPERM;
> > +	}
> > +
> > +	if (!node_online(nid)) {
> > +		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
> 
> "giving"
> 

Get it.

> > +		return -EPERM;
> > +	}
> > +
> > +	/* find first uninitialized cpu */
> > +	for_each_present_cpu(num) {
> 
> s/num/cpu/ would be conventional.  "num" is a pretty poor identifier in
> general - it fails to identify what it is counting.
> 

I will replace the identifier 'num' with 'cpu'.

> > +		if (per_cpu(cpu_sys_devices, num) == NULL) {
> > +			selected = num;
> 
> Similarly, I'd have used "selected_cpu".
> 

Get it.

> > +			break;
> > +		}
> > +	}
> > +
> > +	if (selected >= num_possible_cpus()) {
> > +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> > +		return -EPERM;
> > +	}
> > +
> > +	/* register cpu */
> > +	arch_register_cpu_node(selected, nid);
> > +	acpi_map_lsapic_emu(selected, nid);
> > +
> > +	return count;
> > +}
> > +EXPORT_SYMBOL(arch_cpu_probe);
> 
> arch_cpu_probe() is global and exported to modules, but is undocumented.
> 
> If it had been documented, I might have been able to work out why arg
> `count' is checked, but never used.
> 

Sorry, Andrew, I did not catch it. Do you mean to add the document before
 the definition of the function arch_cpu_probe?

> > +ssize_t arch_cpu_release(const char *buf, size_t count)
> > +{
> > +	int cpu = 0;
> > +
> > +	cpu =  simple_strtoul(buf, NULL, 0);
> 
> unneeded initialisation, spurious whitespace, checkpatch.
> 

Agree.

> > +	/* cpu 0 is not hotplugable */
> > +	if (cpu == 0) {
> > +		printk(KERN_ERR "can not release cpu 0.\n");
> 
> It's generally better to make kernel messages self-identifying. 
> Especially error messages.  If someone comes along and sees "can not
> release cpu 0" in their logs, they don't have a clue what caused it
> unless they download the kernel sources and go grepping.
> 

How about "arch_cpu_release: can not release cpu 0.\n"?

> > +		return -EPERM;
> > +	}
> > +
> > +	if (cpu_online(cpu)) {
> > +		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
> > +		if (!cpu_down(cpu)) {
> > +			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
> 
> "failed", "giving".
> 

Get it.

> > +			return -EPERM;
> > +		}
> > +
> > +	}
> > +
> > +	arch_unregister_cpu(cpu);
> > +	acpi_unmap_lsapic(cpu);
> > +
> > +	return count;
> > +}
> > +EXPORT_SYMBOL(arch_cpu_release);
> 
> No documentation.
> 

Sorry, It is the same with function arch_cpu_probe, I did not catch the
problem, should I add documentation before the definition or declaration? Or
add the documentation into directory Documentation/.

> >  #else /* CONFIG_HOTPLUG_CPU */
> >  
> >  static int __init arch_register_cpu(int num)
> > @@ -83,8 +158,14 @@
> >  		register_one_node(i);
> >  #endif
> >  
> > -	for_each_present_cpu(i)
> > -		arch_register_cpu(i);
> > +	/*
> > +	 * when cpu hotplug emulation enabled, register the online cpu only,
> > +	 * the rests are reserved for cpu probe.
> > +	 */
> 
> Something like "When cpu hotplug emulation is enabled, register only
> the online cpu.  The remainder are reserved for cpu probing.".
> 
> 

Get it.

> > +	for_each_present_cpu(i) {
> > +		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
> > +			arch_register_cpu(i);
> > +	}
> >  
> >  	return 0;
> >  }
> >
> > ...
> >
> > --- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-12-10 13:42:34.593331000 +0800
> > +++ linux-hpe4/drivers/acpi/processor_driver.c	2010-12-10 14:48:32.143331001 +0800
> > @@ -542,6 +542,14 @@
> >  		goto err_free_cpumask;
> >  
> >  	sysdev = get_cpu_sysdev(pr->id);
> > +	/*
> > +	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
> > +	 * throu the cpu probe interface. Return directly.
> 
> s/emulation, the/emulation.  The/
> s/throu/through/
> 
> > +	 */
> > +	if (sysdev == NULL) {
> > +		goto out;
> > +	}
> 
> Unneeded braces.
> 
> >  	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
> >  		result = -EFAULT;
> >  		goto err_remove_fs;
> > @@ -582,6 +590,7 @@
> >  		goto err_remove_sysfs;
> >  	}
> >  
> > +out:
> >  	return 0;
> >  
> >
> > ...
> >
> > --- linux-hpe4.orig/drivers/base/cpu.c	2010-12-10 14:39:43.333331000 +0800
> > +++ linux-hpe4/drivers/base/cpu.c	2010-12-10 14:48:32.143331001 +0800
> > @@ -22,9 +22,15 @@
> >  };
> >  EXPORT_SYMBOL(cpu_sysdev_class);
> >  
> > -static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> > +DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> >  
> >  #ifdef CONFIG_HOTPLUG_CPU
> > +/*
> > + * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
> 
> s/it/It/.
> 
> > + * disabled in default, we can enable it throu grub parameter cpu_hpe=on
> 
> "through".
> 
> > + */
> > +int cpu_hpe_on;
> 
> __read_mostly, perhaps.
> 

CPU Hotplug emulation is for debug purpose, so cpu_hpe_on is not used very frequently.

> >  static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
> >  			   char *buf)
> >  {
> >
> > ...
> >

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-23  1:34       ` Shaohui Zheng
  0 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  1:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Wed, Dec 22, 2010 at 04:27:27PM -0800, Andrew Morton wrote:
> On Fri, 10 Dec 2010 15:31:24 +0800
> > +
> > +ssize_t arch_cpu_probe(const char *buf, size_t count)
> > +{
> > +	int nid = 0;
> > +	int num = 0, selected = 0;
> 
> One definition per line make for more maintainable code.
> 
> Two of these initialisations are unnecessary.
> 
Agree, I will put them into 2 lines, and remove the initialisations.
I always try to initialize them when we define it, it seems that it is a bad habit.

> > +	/* check parameters */
> > +	if (!buf || count < 2)
> > +		return -EPERM;
> > +
> > +	nid = simple_strtoul(buf, NULL, 0);
> 
> checkpatch?

it is a warning, so I ignore it.
I will solve it.

> 
> > +	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
> 
> "Add a CPU to node %d" would make more sense.
> 

Get it.

> > +	if (nid < 0 || nid > nr_node_ids - 1) {
> > +		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
> > +			nid, nr_node_ids);
> > +		return -EPERM;
> > +	}
> > +
> > +	if (!node_online(nid)) {
> > +		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
> 
> "giving"
> 

Get it.

> > +		return -EPERM;
> > +	}
> > +
> > +	/* find first uninitialized cpu */
> > +	for_each_present_cpu(num) {
> 
> s/num/cpu/ would be conventional.  "num" is a pretty poor identifier in
> general - it fails to identify what it is counting.
> 

I will replace the identifier 'num' with 'cpu'.

> > +		if (per_cpu(cpu_sys_devices, num) == NULL) {
> > +			selected = num;
> 
> Similarly, I'd have used "selected_cpu".
> 

Get it.

> > +			break;
> > +		}
> > +	}
> > +
> > +	if (selected >= num_possible_cpus()) {
> > +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> > +		return -EPERM;
> > +	}
> > +
> > +	/* register cpu */
> > +	arch_register_cpu_node(selected, nid);
> > +	acpi_map_lsapic_emu(selected, nid);
> > +
> > +	return count;
> > +}
> > +EXPORT_SYMBOL(arch_cpu_probe);
> 
> arch_cpu_probe() is global and exported to modules, but is undocumented.
> 
> If it had been documented, I might have been able to work out why arg
> `count' is checked, but never used.
> 

Sorry, Andrew, I did not catch it. Do you mean to add the document before
 the definition of the function arch_cpu_probe?

> > +ssize_t arch_cpu_release(const char *buf, size_t count)
> > +{
> > +	int cpu = 0;
> > +
> > +	cpu =  simple_strtoul(buf, NULL, 0);
> 
> unneeded initialisation, spurious whitespace, checkpatch.
> 

Agree.

> > +	/* cpu 0 is not hotplugable */
> > +	if (cpu == 0) {
> > +		printk(KERN_ERR "can not release cpu 0.\n");
> 
> It's generally better to make kernel messages self-identifying. 
> Especially error messages.  If someone comes along and sees "can not
> release cpu 0" in their logs, they don't have a clue what caused it
> unless they download the kernel sources and go grepping.
> 

How about "arch_cpu_release: can not release cpu 0.\n"?

> > +		return -EPERM;
> > +	}
> > +
> > +	if (cpu_online(cpu)) {
> > +		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
> > +		if (!cpu_down(cpu)) {
> > +			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
> 
> "failed", "giving".
> 

Get it.

> > +			return -EPERM;
> > +		}
> > +
> > +	}
> > +
> > +	arch_unregister_cpu(cpu);
> > +	acpi_unmap_lsapic(cpu);
> > +
> > +	return count;
> > +}
> > +EXPORT_SYMBOL(arch_cpu_release);
> 
> No documentation.
> 

Sorry, It is the same with function arch_cpu_probe, I did not catch the
problem, should I add documentation before the definition or declaration? Or
add the documentation into directory Documentation/.

> >  #else /* CONFIG_HOTPLUG_CPU */
> >  
> >  static int __init arch_register_cpu(int num)
> > @@ -83,8 +158,14 @@
> >  		register_one_node(i);
> >  #endif
> >  
> > -	for_each_present_cpu(i)
> > -		arch_register_cpu(i);
> > +	/*
> > +	 * when cpu hotplug emulation enabled, register the online cpu only,
> > +	 * the rests are reserved for cpu probe.
> > +	 */
> 
> Something like "When cpu hotplug emulation is enabled, register only
> the online cpu.  The remainder are reserved for cpu probing.".
> 
> 

Get it.

> > +	for_each_present_cpu(i) {
> > +		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
> > +			arch_register_cpu(i);
> > +	}
> >  
> >  	return 0;
> >  }
> >
> > ...
> >
> > --- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-12-10 13:42:34.593331000 +0800
> > +++ linux-hpe4/drivers/acpi/processor_driver.c	2010-12-10 14:48:32.143331001 +0800
> > @@ -542,6 +542,14 @@
> >  		goto err_free_cpumask;
> >  
> >  	sysdev = get_cpu_sysdev(pr->id);
> > +	/*
> > +	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
> > +	 * throu the cpu probe interface. Return directly.
> 
> s/emulation, the/emulation.  The/
> s/throu/through/
> 
> > +	 */
> > +	if (sysdev == NULL) {
> > +		goto out;
> > +	}
> 
> Unneeded braces.
> 
> >  	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
> >  		result = -EFAULT;
> >  		goto err_remove_fs;
> > @@ -582,6 +590,7 @@
> >  		goto err_remove_sysfs;
> >  	}
> >  
> > +out:
> >  	return 0;
> >  
> >
> > ...
> >
> > --- linux-hpe4.orig/drivers/base/cpu.c	2010-12-10 14:39:43.333331000 +0800
> > +++ linux-hpe4/drivers/base/cpu.c	2010-12-10 14:48:32.143331001 +0800
> > @@ -22,9 +22,15 @@
> >  };
> >  EXPORT_SYMBOL(cpu_sysdev_class);
> >  
> > -static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> > +DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> >  
> >  #ifdef CONFIG_HOTPLUG_CPU
> > +/*
> > + * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
> 
> s/it/It/.
> 
> > + * disabled in default, we can enable it throu grub parameter cpu_hpe=on
> 
> "through".
> 
> > + */
> > +int cpu_hpe_on;
> 
> __read_mostly, perhaps.
> 

CPU Hotplug emulation is for debug purpose, so cpu_hpe_on is not used very frequently.

> >  static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
> >  			   char *buf)
> >  {
> >
> > ...
> >

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
  2010-12-23  0:27     ` Andrew Morton
@ 2010-12-23  1:38       ` David Rientjes
  -1 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-23  1:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Haicheng Li, lethal, Andi Kleen,
	Shaohui Zheng, dave, Greg KH

On Wed, 22 Dec 2010, Andrew Morton wrote:

> > Index: linux-hpe4/mm/memory_hotplug.c
> > ===================================================================
> > --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> > +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> > @@ -924,3 +924,63 @@
> >  }
> >  #endif /* CONFIG_MEMORY_HOTREMOVE */
> >  EXPORT_SYMBOL_GPL(remove_memory);
> > +
> > +#ifdef CONFIG_DEBUG_FS
> > +#include <linux/debugfs.h>
> > +
> > +static struct dentry *memhp_debug_root;
> > +
> > +static ssize_t add_node_store(struct file *file, const char __user *buf,
> > +				size_t count, loff_t *ppos)
> > +{
> > +	nodemask_t mask;
> 
> NODEMASK_ALLOC()?
> 

We traditionally haven't been using NODEMASK_ALLOC() in sysfs (or, in this 
case, debugfs) functions because they're never deep in a call chain.  Even 
for 4K node support, which isn't a supported config on any arch that 
allows CONFIG_MEMORY_HOTPLUG, this would only be 512 bytes on the short 
stack.

I agree with the remainder of the points in your review and will be 
sending fixes against -mm, thanks!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
@ 2010-12-23  1:38       ` David Rientjes
  0 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-23  1:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Haicheng Li, lethal, Andi Kleen,
	Shaohui Zheng, dave, Greg KH

On Wed, 22 Dec 2010, Andrew Morton wrote:

> > Index: linux-hpe4/mm/memory_hotplug.c
> > ===================================================================
> > --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> > +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> > @@ -924,3 +924,63 @@
> >  }
> >  #endif /* CONFIG_MEMORY_HOTREMOVE */
> >  EXPORT_SYMBOL_GPL(remove_memory);
> > +
> > +#ifdef CONFIG_DEBUG_FS
> > +#include <linux/debugfs.h>
> > +
> > +static struct dentry *memhp_debug_root;
> > +
> > +static ssize_t add_node_store(struct file *file, const char __user *buf,
> > +				size_t count, loff_t *ppos)
> > +{
> > +	nodemask_t mask;
> 
> NODEMASK_ALLOC()?
> 

We traditionally haven't been using NODEMASK_ALLOC() in sysfs (or, in this 
case, debugfs) functions because they're never deep in a call chain.  Even 
for 4K node support, which isn't a supported config on any arch that 
allows CONFIG_MEMORY_HOTPLUG, this would only be 512 bytes on the short 
stack.

I agree with the remainder of the points in your review and will be 
sending fixes against -mm, thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [7/7, v9] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
  2010-12-23  0:27     ` Andrew Morton
@ 2010-12-23  2:00       ` Shaohui Zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  2:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Haicheng Li

On Wed, Dec 22, 2010 at 04:27:36PM -0800, Andrew Morton wrote:
> On Fri, 10 Dec 2010 15:31:26 +0800
> shaohui.zheng@intel.com wrote:
> 
> > From:  Shaohui Zheng <shaohui.zheng@intel.com>
> > 
> > Add add_memory interface to support to memory hotplug emulation for each online
> > node under debugfs. The reserved memory can be added into desired node with
> > this interface.
> > 
> > The layout on debugfs:
> > 	mem_hotplug/node0/add_memory
> > 	mem_hotplug/node1/add_memory
> > 	mem_hotplug/node2/add_memory
> > 	...
> > 
> > Add a memory section(128M) to node 3(boots with mem=1024m)
> > 
> > 	echo 0x40000000 > mem_hotplug/node3/add_memory
> > 
> >
> > ...
> >
> > +#ifdef CONFIG_ARCH_MEMORY_PROBE
> > +
> > +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> > +				size_t count, loff_t *ppos)
> > +{
> > +	u64 phys_addr = 0;
> 
> Even more unneeded initalisation.
> 
> Please check the whole patchset for this.  It's bad because it can
> sometimes generate more code and because it can sometimes hide bugs by
> suppressing used-uninitialsied warnings.
> 

Yes, It is a my habit to initialize variable when define it. I will check them 
one by one.

> > +	int nid = file->private_data - NULL;
> 
> Well that was sneaky.
> 
> It would be more conventional to just use the typecast:
> 
> 	int nid = (long)file->private_data;
> 
> 

An explicit typecast looks much better.

> > +	int ret;
> > +
> > +	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> > +	phys_addr = simple_strtoull(buf, NULL, 0);
> 
> checkpatch
> 

We ignored the warning for function simple_strtoull in the whole patchset.
We will solve it one by one.

> > +	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
> > +	if (ret)
> > +		count = ret;
> > +
> > +	return count;
> > +}
> > +
> > +static int add_memory_open(struct inode *inode, struct file *file)
> > +{
> > +	file->private_data = inode->i_private;
> 
> Was this usage of i_private and private_data documented in comments
> somewhere?
> 

Yes, I added the usage information when create the add_memory entry, it seems
that I should also add comment here.

/* the nid information was represented by the offset of pointer(NULL+nid) */
	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
			NULL + nid, &add_memory_file_ops))

> > +	return 0;
> > +}
> > +
> > +static const struct file_operations add_memory_file_ops = {
> > +	.open		= add_memory_open,
> > +	.write		= add_memory_store,
> > +	.llseek		= generic_file_llseek,
> > +};
> > +
> > +/*
> > + * Create add_memory debugfs entry under specified node
> > + */
> > +static int debugfs_create_add_memory_entry(int nid)
> > +{
> > +	char buf[32];
> > +	static struct dentry *node_debug_root;
> > +
> > +	snprintf(buf, sizeof(buf), "node%d", nid);
> > +	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
> > +	if (!node_debug_root)
> > +		return -ENOMEM;
> 
> hm, debugfs_create_dir() was poorly designed - it should return an
> ERR_PTR() so callers don't need to assume ENOMEM, which may be incorrect.
> 

Totally agree. I see that the simliar call on debugfs_create_dir. For the failure,
most of them assume ENOMEM, some of them assume as EINVAL.

> > +	/* the nid information was represented by the offset of pointer(NULL+nid) */
> > +	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> > +			NULL + nid, &add_memory_file_ops))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init memory_debug_init(void)
> > +{
> > +	int nid;
> > +
> > +	if (!memhp_debug_root)
> > +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> > +	if (!memhp_debug_root)
> > +		return -ENOMEM;
> > +
> > +	for_each_online_node(nid)
> > +		 debugfs_create_add_memory_entry(nid);
> > +
> > +	return 0;
> > +}
> > +
> > +module_init(memory_debug_init);
> > +#else
> > +static debugfs_create_add_memory_entry(int nid)
> 
> "static int".
> 

Good catching.

> > +{
> > +	return 0;
> > +}
> > +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> > +
> >  static ssize_t add_node_store(struct file *file, const char __user *buf,
> >  				size_t count, loff_t *ppos)
> >  {
> > @@ -963,6 +1038,8 @@
> >  		return -ENOMEM;
> >  
> >  	ret = add_memory(nid, start, size);
> > +
> > +	debugfs_create_add_memory_entry(nid);
> >  	return ret ? ret : count;
> >  }
> >  
> >
> > ...
> >

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [7/7, v9] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
@ 2010-12-23  2:00       ` Shaohui Zheng
  0 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  2:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Haicheng Li

On Wed, Dec 22, 2010 at 04:27:36PM -0800, Andrew Morton wrote:
> On Fri, 10 Dec 2010 15:31:26 +0800
> shaohui.zheng@intel.com wrote:
> 
> > From:  Shaohui Zheng <shaohui.zheng@intel.com>
> > 
> > Add add_memory interface to support to memory hotplug emulation for each online
> > node under debugfs. The reserved memory can be added into desired node with
> > this interface.
> > 
> > The layout on debugfs:
> > 	mem_hotplug/node0/add_memory
> > 	mem_hotplug/node1/add_memory
> > 	mem_hotplug/node2/add_memory
> > 	...
> > 
> > Add a memory section(128M) to node 3(boots with mem=1024m)
> > 
> > 	echo 0x40000000 > mem_hotplug/node3/add_memory
> > 
> >
> > ...
> >
> > +#ifdef CONFIG_ARCH_MEMORY_PROBE
> > +
> > +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> > +				size_t count, loff_t *ppos)
> > +{
> > +	u64 phys_addr = 0;
> 
> Even more unneeded initalisation.
> 
> Please check the whole patchset for this.  It's bad because it can
> sometimes generate more code and because it can sometimes hide bugs by
> suppressing used-uninitialsied warnings.
> 

Yes, It is a my habit to initialize variable when define it. I will check them 
one by one.

> > +	int nid = file->private_data - NULL;
> 
> Well that was sneaky.
> 
> It would be more conventional to just use the typecast:
> 
> 	int nid = (long)file->private_data;
> 
> 

An explicit typecast looks much better.

> > +	int ret;
> > +
> > +	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> > +	phys_addr = simple_strtoull(buf, NULL, 0);
> 
> checkpatch
> 

We ignored the warning for function simple_strtoull in the whole patchset.
We will solve it one by one.

> > +	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
> > +	if (ret)
> > +		count = ret;
> > +
> > +	return count;
> > +}
> > +
> > +static int add_memory_open(struct inode *inode, struct file *file)
> > +{
> > +	file->private_data = inode->i_private;
> 
> Was this usage of i_private and private_data documented in comments
> somewhere?
> 

Yes, I added the usage information when create the add_memory entry, it seems
that I should also add comment here.

/* the nid information was represented by the offset of pointer(NULL+nid) */
	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
			NULL + nid, &add_memory_file_ops))

> > +	return 0;
> > +}
> > +
> > +static const struct file_operations add_memory_file_ops = {
> > +	.open		= add_memory_open,
> > +	.write		= add_memory_store,
> > +	.llseek		= generic_file_llseek,
> > +};
> > +
> > +/*
> > + * Create add_memory debugfs entry under specified node
> > + */
> > +static int debugfs_create_add_memory_entry(int nid)
> > +{
> > +	char buf[32];
> > +	static struct dentry *node_debug_root;
> > +
> > +	snprintf(buf, sizeof(buf), "node%d", nid);
> > +	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
> > +	if (!node_debug_root)
> > +		return -ENOMEM;
> 
> hm, debugfs_create_dir() was poorly designed - it should return an
> ERR_PTR() so callers don't need to assume ENOMEM, which may be incorrect.
> 

Totally agree. I see that the simliar call on debugfs_create_dir. For the failure,
most of them assume ENOMEM, some of them assume as EINVAL.

> > +	/* the nid information was represented by the offset of pointer(NULL+nid) */
> > +	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> > +			NULL + nid, &add_memory_file_ops))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init memory_debug_init(void)
> > +{
> > +	int nid;
> > +
> > +	if (!memhp_debug_root)
> > +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> > +	if (!memhp_debug_root)
> > +		return -ENOMEM;
> > +
> > +	for_each_online_node(nid)
> > +		 debugfs_create_add_memory_entry(nid);
> > +
> > +	return 0;
> > +}
> > +
> > +module_init(memory_debug_init);
> > +#else
> > +static debugfs_create_add_memory_entry(int nid)
> 
> "static int".
> 

Good catching.

> > +{
> > +	return 0;
> > +}
> > +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> > +
> >  static ssize_t add_node_store(struct file *file, const char __user *buf,
> >  				size_t count, loff_t *ppos)
> >  {
> > @@ -963,6 +1038,8 @@
> >  		return -ENOMEM;
> >  
> >  	ret = add_memory(nid, start, size);
> > +
> > +	debugfs_create_add_memory_entry(nid);
> >  	return ret ? ret : count;
> >  }
> >  
> >
> > ...
> >

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
  2010-12-23  1:38       ` David Rientjes
@ 2010-12-23  2:20         ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  2:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Haicheng Li, lethal, Andi Kleen,
	Shaohui Zheng, dave, Greg KH

On Wed, 22 Dec 2010 17:38:44 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> On Wed, 22 Dec 2010, Andrew Morton wrote:
> 
> > > Index: linux-hpe4/mm/memory_hotplug.c
> > > ===================================================================
> > > --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> > > +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> > > @@ -924,3 +924,63 @@
> > >  }
> > >  #endif /* CONFIG_MEMORY_HOTREMOVE */
> > >  EXPORT_SYMBOL_GPL(remove_memory);
> > > +
> > > +#ifdef CONFIG_DEBUG_FS
> > > +#include <linux/debugfs.h>
> > > +
> > > +static struct dentry *memhp_debug_root;
> > > +
> > > +static ssize_t add_node_store(struct file *file, const char __user *buf,
> > > +				size_t count, loff_t *ppos)
> > > +{
> > > +	nodemask_t mask;
> > 
> > NODEMASK_ALLOC()?
> > 
> 
> We traditionally haven't been using NODEMASK_ALLOC() in sysfs (or, in this 
> case, debugfs) functions because they're never deep in a call chain.  Even 
> for 4K node support, which isn't a supported config on any arch that 
> allows CONFIG_MEMORY_HOTPLUG, this would only be 512 bytes on the short 
> stack.

I bet linux-2.6.227 supports a meganode.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
@ 2010-12-23  2:20         ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  2:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Haicheng Li, lethal, Andi Kleen,
	Shaohui Zheng, dave, Greg KH

On Wed, 22 Dec 2010 17:38:44 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> On Wed, 22 Dec 2010, Andrew Morton wrote:
> 
> > > Index: linux-hpe4/mm/memory_hotplug.c
> > > ===================================================================
> > > --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> > > +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> > > @@ -924,3 +924,63 @@
> > >  }
> > >  #endif /* CONFIG_MEMORY_HOTREMOVE */
> > >  EXPORT_SYMBOL_GPL(remove_memory);
> > > +
> > > +#ifdef CONFIG_DEBUG_FS
> > > +#include <linux/debugfs.h>
> > > +
> > > +static struct dentry *memhp_debug_root;
> > > +
> > > +static ssize_t add_node_store(struct file *file, const char __user *buf,
> > > +				size_t count, loff_t *ppos)
> > > +{
> > > +	nodemask_t mask;
> > 
> > NODEMASK_ALLOC()?
> > 
> 
> We traditionally haven't been using NODEMASK_ALLOC() in sysfs (or, in this 
> case, debugfs) functions because they're never deep in a call chain.  Even 
> for 4K node support, which isn't a supported config on any arch that 
> allows CONFIG_MEMORY_HOTPLUG, this would only be 512 bytes on the short 
> stack.

I bet linux-2.6.227 supports a meganode.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-23  3:21         ` Andrew Morton
@ 2010-12-23  2:24           ` Shaohui Zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  2:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Wed, Dec 22, 2010 at 07:21:18PM -0800, Andrew Morton wrote:
> > > 
> > > checkpatch?
> > 
> > it is a warning, so I ignore it.
> 
> Don't ignore warnings!  At least, not until you've understood the
> reason for them and have a *reason* to ignore them.
> 
> simple_strtoul() will silently accept input of the form "42foo",
> treating it as "42".  That's a userspace bug and the kernel should
> report it.  This means that the code should be changed to handle error
> returns from strict_strtoul().  And those error paths should be tested.
> 

> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	if (selected >= num_possible_cpus()) {
> > > > +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> > > > +		return -EPERM;
> > > > +	}
> > > > +
> > > > +	/* register cpu */
> > > > +	arch_register_cpu_node(selected, nid);
> > > > +	acpi_map_lsapic_emu(selected, nid);
> > > > +
> > > > +	return count;
> > > > +}
> > > > +EXPORT_SYMBOL(arch_cpu_probe);
> > > 
> > > arch_cpu_probe() is global and exported to modules, but is undocumented.
> > > 
> > > If it had been documented, I might have been able to work out why arg
> > > `count' is checked, but never used.
> > > 
> > 
> > Sorry, Andrew, I did not catch it. Do you mean to add the document before
> >  the definition of the function arch_cpu_probe?
> 
> Sure, add a comment documenting the function.

I understand, I will add comments for both arch_cpu_probe/arch_cpu_release.

> 
> Why *does* it check `count' and then not use it?
> 

it is a tricky thing. When I debug it under a Virtual Machine, If I do a cpu
probe via sysfs cpu/probe interface, The function arch_cpu_probe will be called
__three__ times, but only one call is valid, so I add a check on `count` to
ignore the invalid calls.

> > 
> > > > +	/* cpu 0 is not hotplugable */
> > > > +	if (cpu == 0) {
> > > > +		printk(KERN_ERR "can not release cpu 0.\n");
> > > 
> > > It's generally better to make kernel messages self-identifying. 
> > > Especially error messages.  If someone comes along and sees "can not
> > > release cpu 0" in their logs, they don't have a clue what caused it
> > > unless they download the kernel sources and go grepping.
> > > 
> > 
> > How about "arch_cpu_release: can not release cpu 0.\n"?
> 
> Better, although "arch_cpu_release" isn't very meaningful to an
> administrator.  "NUMA hotplug remove" or something like that would be
> more useful.

> 
> All these messages should be looked at from the point of view of the
> people who they are to serve.  Although in this special case, that's
> most likely to be a kernel developer so I guess such clarity isn't
> needed.
> 

It is a good lesson for me, when I meet the similar problem next time, I should
consider more from the point of the user.

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-23  2:24           ` Shaohui Zheng
  0 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  2:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Wed, Dec 22, 2010 at 07:21:18PM -0800, Andrew Morton wrote:
> > > 
> > > checkpatch?
> > 
> > it is a warning, so I ignore it.
> 
> Don't ignore warnings!  At least, not until you've understood the
> reason for them and have a *reason* to ignore them.
> 
> simple_strtoul() will silently accept input of the form "42foo",
> treating it as "42".  That's a userspace bug and the kernel should
> report it.  This means that the code should be changed to handle error
> returns from strict_strtoul().  And those error paths should be tested.
> 

> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	if (selected >= num_possible_cpus()) {
> > > > +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> > > > +		return -EPERM;
> > > > +	}
> > > > +
> > > > +	/* register cpu */
> > > > +	arch_register_cpu_node(selected, nid);
> > > > +	acpi_map_lsapic_emu(selected, nid);
> > > > +
> > > > +	return count;
> > > > +}
> > > > +EXPORT_SYMBOL(arch_cpu_probe);
> > > 
> > > arch_cpu_probe() is global and exported to modules, but is undocumented.
> > > 
> > > If it had been documented, I might have been able to work out why arg
> > > `count' is checked, but never used.
> > > 
> > 
> > Sorry, Andrew, I did not catch it. Do you mean to add the document before
> >  the definition of the function arch_cpu_probe?
> 
> Sure, add a comment documenting the function.

I understand, I will add comments for both arch_cpu_probe/arch_cpu_release.

> 
> Why *does* it check `count' and then not use it?
> 

it is a tricky thing. When I debug it under a Virtual Machine, If I do a cpu
probe via sysfs cpu/probe interface, The function arch_cpu_probe will be called
__three__ times, but only one call is valid, so I add a check on `count` to
ignore the invalid calls.

> > 
> > > > +	/* cpu 0 is not hotplugable */
> > > > +	if (cpu == 0) {
> > > > +		printk(KERN_ERR "can not release cpu 0.\n");
> > > 
> > > It's generally better to make kernel messages self-identifying. 
> > > Especially error messages.  If someone comes along and sees "can not
> > > release cpu 0" in their logs, they don't have a clue what caused it
> > > unless they download the kernel sources and go grepping.
> > > 
> > 
> > How about "arch_cpu_release: can not release cpu 0.\n"?
> 
> Better, although "arch_cpu_release" isn't very meaningful to an
> administrator.  "NUMA hotplug remove" or something like that would be
> more useful.

> 
> All these messages should be looked at from the point of view of the
> people who they are to serve.  Although in this special case, that's
> most likely to be a kernel developer so I guess such clarity isn't
> needed.
> 

It is a good lesson for me, when I meet the similar problem next time, I should
consider more from the point of the user.

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-23  1:34       ` Shaohui Zheng
@ 2010-12-23  3:21         ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  3:21 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Thu, 23 Dec 2010 09:34:10 +0800 Shaohui Zheng <shaohui.zheng@linux.intel.com> wrote:

> On Wed, Dec 22, 2010 at 04:27:27PM -0800, Andrew Morton wrote:
> > On Fri, 10 Dec 2010 15:31:24 +0800
> > > +
> > > +ssize_t arch_cpu_probe(const char *buf, size_t count)
> > > +{
> > > +	int nid = 0;
> > > +	int num = 0, selected = 0;
> > 
> > One definition per line make for more maintainable code.
> > 
> > Two of these initialisations are unnecessary.
> > 
> Agree, I will put them into 2 lines, and remove the initialisations.
> I always try to initialize them when we define it, it seems that it is a bad habit.
> 
> > > +	/* check parameters */
> > > +	if (!buf || count < 2)
> > > +		return -EPERM;
> > > +
> > > +	nid = simple_strtoul(buf, NULL, 0);
> > 
> > checkpatch?
> 
> it is a warning, so I ignore it.

Don't ignore warnings!  At least, not until you've understood the
reason for them and have a *reason* to ignore them.

simple_strtoul() will silently accept input of the form "42foo",
treating it as "42".  That's a userspace bug and the kernel should
report it.  This means that the code should be changed to handle error
returns from strict_strtoul().  And those error paths should be tested.

> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	if (selected >= num_possible_cpus()) {
> > > +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> > > +		return -EPERM;
> > > +	}
> > > +
> > > +	/* register cpu */
> > > +	arch_register_cpu_node(selected, nid);
> > > +	acpi_map_lsapic_emu(selected, nid);
> > > +
> > > +	return count;
> > > +}
> > > +EXPORT_SYMBOL(arch_cpu_probe);
> > 
> > arch_cpu_probe() is global and exported to modules, but is undocumented.
> > 
> > If it had been documented, I might have been able to work out why arg
> > `count' is checked, but never used.
> > 
> 
> Sorry, Andrew, I did not catch it. Do you mean to add the document before
>  the definition of the function arch_cpu_probe?

Sure, add a comment documenting the function.

Why *does* it check `count' and then not use it?

> 
> > > +	/* cpu 0 is not hotplugable */
> > > +	if (cpu == 0) {
> > > +		printk(KERN_ERR "can not release cpu 0.\n");
> > 
> > It's generally better to make kernel messages self-identifying. 
> > Especially error messages.  If someone comes along and sees "can not
> > release cpu 0" in their logs, they don't have a clue what caused it
> > unless they download the kernel sources and go grepping.
> > 
> 
> How about "arch_cpu_release: can not release cpu 0.\n"?

Better, although "arch_cpu_release" isn't very meaningful to an
administrator.  "NUMA hotplug remove" or something like that would be
more useful.

All these messages should be looked at from the point of view of the
people who they are to serve.  Although in this special case, that's
most likely to be a kernel developer so I guess such clarity isn't
needed.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-23  3:21         ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  3:21 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Thu, 23 Dec 2010 09:34:10 +0800 Shaohui Zheng <shaohui.zheng@linux.intel.com> wrote:

> On Wed, Dec 22, 2010 at 04:27:27PM -0800, Andrew Morton wrote:
> > On Fri, 10 Dec 2010 15:31:24 +0800
> > > +
> > > +ssize_t arch_cpu_probe(const char *buf, size_t count)
> > > +{
> > > +	int nid = 0;
> > > +	int num = 0, selected = 0;
> > 
> > One definition per line make for more maintainable code.
> > 
> > Two of these initialisations are unnecessary.
> > 
> Agree, I will put them into 2 lines, and remove the initialisations.
> I always try to initialize them when we define it, it seems that it is a bad habit.
> 
> > > +	/* check parameters */
> > > +	if (!buf || count < 2)
> > > +		return -EPERM;
> > > +
> > > +	nid = simple_strtoul(buf, NULL, 0);
> > 
> > checkpatch?
> 
> it is a warning, so I ignore it.

Don't ignore warnings!  At least, not until you've understood the
reason for them and have a *reason* to ignore them.

simple_strtoul() will silently accept input of the form "42foo",
treating it as "42".  That's a userspace bug and the kernel should
report it.  This means that the code should be changed to handle error
returns from strict_strtoul().  And those error paths should be tested.

> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	if (selected >= num_possible_cpus()) {
> > > +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> > > +		return -EPERM;
> > > +	}
> > > +
> > > +	/* register cpu */
> > > +	arch_register_cpu_node(selected, nid);
> > > +	acpi_map_lsapic_emu(selected, nid);
> > > +
> > > +	return count;
> > > +}
> > > +EXPORT_SYMBOL(arch_cpu_probe);
> > 
> > arch_cpu_probe() is global and exported to modules, but is undocumented.
> > 
> > If it had been documented, I might have been able to work out why arg
> > `count' is checked, but never used.
> > 
> 
> Sorry, Andrew, I did not catch it. Do you mean to add the document before
>  the definition of the function arch_cpu_probe?

Sure, add a comment documenting the function.

Why *does* it check `count' and then not use it?

> 
> > > +	/* cpu 0 is not hotplugable */
> > > +	if (cpu == 0) {
> > > +		printk(KERN_ERR "can not release cpu 0.\n");
> > 
> > It's generally better to make kernel messages self-identifying. 
> > Especially error messages.  If someone comes along and sees "can not
> > release cpu 0" in their logs, they don't have a clue what caused it
> > unless they download the kernel sources and go grepping.
> > 
> 
> How about "arch_cpu_release: can not release cpu 0.\n"?

Better, although "arch_cpu_release" isn't very meaningful to an
administrator.  "NUMA hotplug remove" or something like that would be
more useful.

All these messages should be looked at from the point of view of the
people who they are to serve.  Although in this special case, that's
most likely to be a kernel developer so I guess such clarity isn't
needed.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-23  5:28             ` Andrew Morton
@ 2010-12-23  4:30               ` Shaohui Zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  4:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Wed, Dec 22, 2010 at 09:28:04PM -0800, Andrew Morton wrote:
> On Thu, 23 Dec 2010 10:24:28 +0800 Shaohui Zheng <shaohui.zheng@linux.intel.com> wrote:
> 
> > > 
> > > Why *does* it check `count' and then not use it?
> > > 
> > 
> > it is a tricky thing. When I debug it under a Virtual Machine, If I do a cpu
> > probe via sysfs cpu/probe interface, The function arch_cpu_probe will be called
> > __three__ times, but only one call is valid, so I add a check on `count` to
> > ignore the invalid calls.
> 
> hm, why does it get called three times?  Is that something which
> can/should be fixed in callers rather than in the callee?

It might be a bug in the caller, but just guess currently. I will investigate it.

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-23  4:30               ` Shaohui Zheng
  0 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  4:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Wed, Dec 22, 2010 at 09:28:04PM -0800, Andrew Morton wrote:
> On Thu, 23 Dec 2010 10:24:28 +0800 Shaohui Zheng <shaohui.zheng@linux.intel.com> wrote:
> 
> > > 
> > > Why *does* it check `count' and then not use it?
> > > 
> > 
> > it is a tricky thing. When I debug it under a Virtual Machine, If I do a cpu
> > probe via sysfs cpu/probe interface, The function arch_cpu_probe will be called
> > __three__ times, but only one call is valid, so I add a check on `count` to
> > ignore the invalid calls.
> 
> hm, why does it get called three times?  Is that something which
> can/should be fixed in callers rather than in the callee?

It might be a bug in the caller, but just guess currently. I will investigate it.

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [6/7, v9] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
  2010-12-23  0:27     ` Andrew Morton
@ 2010-12-23  5:10       ` Shaohui Zheng
  -1 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  5:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Sam Ravnborg, Haicheng Li

On Wed, Dec 22, 2010 at 04:27:32PM -0800, Andrew Morton wrote:
> >  static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
> > @@ -198,6 +200,8 @@
> >  {
> >  	int cpuid, phys_id;
> >  	unsigned long timeout;
> > +	u8 cpu_probe_on = 0;
> 
> Unneeded initialisation.
> 
> Does this cause an unused var warning when
> CONFIG_ARCH_CPU_PROBE_RELEASE=n?
> 

I am trying to avoid too much ifdef here, it seems it take an unused var
warining when CONFIG_ARCH_CPU_PROBE_RELEASE=n. good catching.

I will figure out a better method.

> > +	struct cpuinfo_x86 *c;
> >  
> >  	/*
> >  	 * If waken up by an INIT in an 82489DX configuration
> >
> > ...
> >
> > +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> > +/*
> > + * Put the logical cpu into a new sokect, and encapsule it into core 0.
> 
> That comment needs help.
> 

Agree, the comment is too simple, should add better documents for function
fake_cpu_socket_info.

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [6/7, v9] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
@ 2010-12-23  5:10       ` Shaohui Zheng
  0 siblings, 0 replies; 61+ messages in thread
From: Shaohui Zheng @ 2010-12-23  5:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Sam Ravnborg, Haicheng Li

On Wed, Dec 22, 2010 at 04:27:32PM -0800, Andrew Morton wrote:
> >  static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
> > @@ -198,6 +200,8 @@
> >  {
> >  	int cpuid, phys_id;
> >  	unsigned long timeout;
> > +	u8 cpu_probe_on = 0;
> 
> Unneeded initialisation.
> 
> Does this cause an unused var warning when
> CONFIG_ARCH_CPU_PROBE_RELEASE=n?
> 

I am trying to avoid too much ifdef here, it seems it take an unused var
warining when CONFIG_ARCH_CPU_PROBE_RELEASE=n. good catching.

I will figure out a better method.

> > +	struct cpuinfo_x86 *c;
> >  
> >  	/*
> >  	 * If waken up by an INIT in an 82489DX configuration
> >
> > ...
> >
> > +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> > +/*
> > + * Put the logical cpu into a new sokect, and encapsule it into core 0.
> 
> That comment needs help.
> 

Agree, the comment is too simple, should add better documents for function
fake_cpu_socket_info.

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-23  2:24           ` Shaohui Zheng
@ 2010-12-23  5:28             ` Andrew Morton
  -1 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  5:28 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Thu, 23 Dec 2010 10:24:28 +0800 Shaohui Zheng <shaohui.zheng@linux.intel.com> wrote:

> > 
> > Why *does* it check `count' and then not use it?
> > 
> 
> it is a tricky thing. When I debug it under a Virtual Machine, If I do a cpu
> probe via sysfs cpu/probe interface, The function arch_cpu_probe will be called
> __three__ times, but only one call is valid, so I add a check on `count` to
> ignore the invalid calls.

hm, why does it get called three times?  Is that something which
can/should be fixed in callers rather than in the callee?


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-23  5:28             ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2010-12-23  5:28 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: shaohui.zheng, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	rientjes, dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu,
	Tejun Heo, Haicheng Li

On Thu, 23 Dec 2010 10:24:28 +0800 Shaohui Zheng <shaohui.zheng@linux.intel.com> wrote:

> > 
> > Why *does* it check `count' and then not use it?
> > 
> 
> it is a tricky thing. When I debug it under a Virtual Machine, If I do a cpu
> probe via sysfs cpu/probe interface, The function arch_cpu_probe will be called
> __three__ times, but only one call is valid, so I add a check on `count` to
> ignore the invalid calls.

hm, why does it get called three times?  Is that something which
can/should be fixed in callers rather than in the callee?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
  2010-12-23  0:27     ` Andrew Morton
@ 2010-12-28  7:34       ` David Rientjes
  -1 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-28  7:34 UTC (permalink / raw)
  To: Shaohui Zheng, Andrew Morton
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

On Wed, 22 Dec 2010, Andrew Morton wrote:

> > Index: linux-hpe4/mm/memory_hotplug.c
> > ===================================================================
> > --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> > +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> > @@ -924,3 +924,63 @@
> >  }
> >  #endif /* CONFIG_MEMORY_HOTREMOVE */
> >  EXPORT_SYMBOL_GPL(remove_memory);
> > +
> > +#ifdef CONFIG_DEBUG_FS
> > +#include <linux/debugfs.h>
> > +
> > +static struct dentry *memhp_debug_root;
> > +
> > +static ssize_t add_node_store(struct file *file, const char __user *buf,
> > +				size_t count, loff_t *ppos)
> > +{
> > +	nodemask_t mask;
> 
> NODEMASK_ALLOC()?
> 
> > +	u64 start, size;
> > +	char buffer[64];
> > +	char *p;
> > +	int nid;
> > +	int ret;
> > +
> > +	memset(buffer, 0, sizeof(buffer));
> > +	if (count > sizeof(buffer) - 1)
> > +		count = sizeof(buffer) - 1;
> 
> This will cause the write to return a smaller number than `count': a
> short write.  Some userspace code may then decide to write the
> remainder of the data (whcih is the correct way to use the write()
> syscall).
> 
> Could be a bit dangerous, and perhaps simply declaring an error if too
> much data was written would be a better approach.
> 
> > +	if (copy_from_user(buffer, buf, count))
> > +		return -EFAULT;
> > +
> > +	size = memparse(buffer, &p);
> > +	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
> 
> PAGES_PER_SECTION has type unsigned long, so the rhs of this comparison
> might overflow on 32-bit, should anyone ever try to use this code on
> 32-bit.
> 
> otoh the compiler might do it as 64-bit because the lhs is 64-bit.  Not
> sure.
> 
> > +		return -EINVAL;
> > +	if (*p != '@')
> > +		return -EINVAL;
> > +
> > +	start = simple_strtoull(p + 1, NULL, 0);
> 
> You disagreed with checkpatch?
> 
> > +	nodes_andnot(mask, node_possible_map, node_online_map);
> > +	nid = first_node(mask);
> > +	if (nid == MAX_NUMNODES)
> > +		return -ENOMEM;
> > +
> > +	ret = add_memory(nid, start, size);
> > +	return ret ? ret : count;
> > +}
> > +
> > +static const struct file_operations add_node_file_ops = {
> > +	.write		= add_node_store,
> > +	.llseek		= generic_file_llseek,
> > +};
> > +
> > +static int __init node_debug_init(void)
> > +{
> > +	if (!memhp_debug_root)
> > +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> > +	if (!memhp_debug_root)
> > +		return -ENOMEM;
> > +
> > +	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
> > +			NULL, &add_node_file_ops))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +module_init(node_debug_init);
> > +#endif /* CONFIG_DEBUG_FS */

Shaohui, I'll reply to this message with an updated version of this patch 
to address Andrew's comments.  You can merge it into your series or Andrew 
can take it seperately (although it doesn't do much good without "x86: add 
numa=possible command line option" unless you have hotpluggable SRAT 
entries and CONFIG_ACPI_NUMA).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
@ 2010-12-28  7:34       ` David Rientjes
  0 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-28  7:34 UTC (permalink / raw)
  To: Shaohui Zheng, Andrew Morton
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

On Wed, 22 Dec 2010, Andrew Morton wrote:

> > Index: linux-hpe4/mm/memory_hotplug.c
> > ===================================================================
> > --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> > +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> > @@ -924,3 +924,63 @@
> >  }
> >  #endif /* CONFIG_MEMORY_HOTREMOVE */
> >  EXPORT_SYMBOL_GPL(remove_memory);
> > +
> > +#ifdef CONFIG_DEBUG_FS
> > +#include <linux/debugfs.h>
> > +
> > +static struct dentry *memhp_debug_root;
> > +
> > +static ssize_t add_node_store(struct file *file, const char __user *buf,
> > +				size_t count, loff_t *ppos)
> > +{
> > +	nodemask_t mask;
> 
> NODEMASK_ALLOC()?
> 
> > +	u64 start, size;
> > +	char buffer[64];
> > +	char *p;
> > +	int nid;
> > +	int ret;
> > +
> > +	memset(buffer, 0, sizeof(buffer));
> > +	if (count > sizeof(buffer) - 1)
> > +		count = sizeof(buffer) - 1;
> 
> This will cause the write to return a smaller number than `count': a
> short write.  Some userspace code may then decide to write the
> remainder of the data (whcih is the correct way to use the write()
> syscall).
> 
> Could be a bit dangerous, and perhaps simply declaring an error if too
> much data was written would be a better approach.
> 
> > +	if (copy_from_user(buffer, buf, count))
> > +		return -EFAULT;
> > +
> > +	size = memparse(buffer, &p);
> > +	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
> 
> PAGES_PER_SECTION has type unsigned long, so the rhs of this comparison
> might overflow on 32-bit, should anyone ever try to use this code on
> 32-bit.
> 
> otoh the compiler might do it as 64-bit because the lhs is 64-bit.  Not
> sure.
> 
> > +		return -EINVAL;
> > +	if (*p != '@')
> > +		return -EINVAL;
> > +
> > +	start = simple_strtoull(p + 1, NULL, 0);
> 
> You disagreed with checkpatch?
> 
> > +	nodes_andnot(mask, node_possible_map, node_online_map);
> > +	nid = first_node(mask);
> > +	if (nid == MAX_NUMNODES)
> > +		return -ENOMEM;
> > +
> > +	ret = add_memory(nid, start, size);
> > +	return ret ? ret : count;
> > +}
> > +
> > +static const struct file_operations add_node_file_ops = {
> > +	.write		= add_node_store,
> > +	.llseek		= generic_file_llseek,
> > +};
> > +
> > +static int __init node_debug_init(void)
> > +{
> > +	if (!memhp_debug_root)
> > +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> > +	if (!memhp_debug_root)
> > +		return -ENOMEM;
> > +
> > +	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
> > +			NULL, &add_node_file_ops))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +module_init(node_debug_init);
> > +#endif /* CONFIG_DEBUG_FS */

Shaohui, I'll reply to this message with an updated version of this patch 
to address Andrew's comments.  You can merge it into your series or Andrew 
can take it seperately (although it doesn't do much good without "x86: add 
numa=possible command line option" unless you have hotpluggable SRAT 
entries and CONFIG_ACPI_NUMA).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch] mm: add node hotplug emulation
  2010-12-28  7:34       ` David Rientjes
@ 2010-12-28  7:34         ` David Rientjes
  -1 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-28  7:34 UTC (permalink / raw)
  To: Shaohui Zheng, Andrew Morton
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

Add an interface to allow new nodes to be added when performing memory
hot-add.  This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.

This adds a new debugfs interface at /sys/kernel/debug/hotplug/add_node
that behaves in a similar way to the memory hot-add "probe" interface.
Its format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.

The new node id is a currently offline, but possible, node.  The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.

For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional four nodes as being possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal

 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/node/node1, in this example. ]

The new node is now hotplugged and ready for testing.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 Documentation/memory-hotplug.txt |   24 +++++++++++++
 mm/memory_hotplug.c              |   69 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -18,6 +18,7 @@ be changed often.
 4. Physical memory hot-add phase
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
+  4.3 Node hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -215,6 +216,29 @@ current implementation). You'll have to online memory by yourself.
 Please see "How to online memory" in this text.
 
 
+4.3 Node hotplug emulation
+------------
+With debugfs, it is possible to test node hotplug by assigning the newly
+added memory to a new node id when using a different interface with a similar
+behavior to "probe" described in section 4.2.  If a node id is possible
+(there are bits in /sys/devices/system/memory/possible that are not online),
+then it may be used to emulate a newly added node as the result of memory
+hotplug by using the debugfs "add_node" interface.
+
+The add_node interface is located at "hotplug/add_node" at the debugfs mount
+point.
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/kernel/debug/hotplug/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G").  The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
 
 ------------------------------
 5. Logical Memory hot-add phase
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -927,3 +927,72 @@ int remove_memory(u64 start, u64 size)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *hotplug_debug_root;
+
+static ssize_t add_node_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	NODEMASK_ALLOC(nodemask_t, mask, GFP_KERNEL);
+	u64 start, size;
+	char buffer[128];
+	char *p;
+	int nid;
+	int ret;
+
+	if (!mask)
+		return -ENOMEM;
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1) {
+		ret = -EINVAL;
+		goto out;
+	}
+	if (copy_from_user(buffer, buf, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = -EINVAL;
+	size = memparse(buffer, &p);
+	if (size < ((u64)PAGES_PER_SECTION << PAGE_SHIFT))
+		goto out;
+	if (*p != '@')
+		goto out;
+	if (strict_strtoull(p + 1, 0, &start) < 0)
+		goto out;
+
+	ret = -ENOMEM;
+	nodes_andnot(*mask, node_possible_map, node_online_map);
+	nid = first_node(*mask);
+	if (nid == MAX_NUMNODES)
+		goto out;
+
+	ret = add_memory(nid, start, size);
+out:
+	NODEMASK_FREE(mask);
+	return ret ? ret : count;
+}
+
+static const struct file_operations add_node_file_ops = {
+	.write		= add_node_store,
+	.llseek		= generic_file_llseek,
+};
+
+static int __init hotplug_debug_init(void)
+{
+	hotplug_debug_root = debugfs_create_dir("hotplug", NULL);
+	if (!hotplug_debug_root)
+		return -ENOMEM;
+
+	if (!debugfs_create_file("add_node", S_IWUSR, hotplug_debug_root,
+			NULL, &add_node_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+module_init(hotplug_debug_init);
+#endif /* CONFIG_DEBUG_FS */

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch] mm: add node hotplug emulation
@ 2010-12-28  7:34         ` David Rientjes
  0 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2010-12-28  7:34 UTC (permalink / raw)
  To: Shaohui Zheng, Andrew Morton
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

Add an interface to allow new nodes to be added when performing memory
hot-add.  This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.

This adds a new debugfs interface at /sys/kernel/debug/hotplug/add_node
that behaves in a similar way to the memory hot-add "probe" interface.
Its format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.

The new node id is a currently offline, but possible, node.  The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.

For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional four nodes as being possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal

 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/node/node1, in this example. ]

The new node is now hotplugged and ready for testing.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 Documentation/memory-hotplug.txt |   24 +++++++++++++
 mm/memory_hotplug.c              |   69 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -18,6 +18,7 @@ be changed often.
 4. Physical memory hot-add phase
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
+  4.3 Node hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -215,6 +216,29 @@ current implementation). You'll have to online memory by yourself.
 Please see "How to online memory" in this text.
 
 
+4.3 Node hotplug emulation
+------------
+With debugfs, it is possible to test node hotplug by assigning the newly
+added memory to a new node id when using a different interface with a similar
+behavior to "probe" described in section 4.2.  If a node id is possible
+(there are bits in /sys/devices/system/memory/possible that are not online),
+then it may be used to emulate a newly added node as the result of memory
+hotplug by using the debugfs "add_node" interface.
+
+The add_node interface is located at "hotplug/add_node" at the debugfs mount
+point.
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/kernel/debug/hotplug/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G").  The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
 
 ------------------------------
 5. Logical Memory hot-add phase
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -927,3 +927,72 @@ int remove_memory(u64 start, u64 size)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *hotplug_debug_root;
+
+static ssize_t add_node_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	NODEMASK_ALLOC(nodemask_t, mask, GFP_KERNEL);
+	u64 start, size;
+	char buffer[128];
+	char *p;
+	int nid;
+	int ret;
+
+	if (!mask)
+		return -ENOMEM;
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1) {
+		ret = -EINVAL;
+		goto out;
+	}
+	if (copy_from_user(buffer, buf, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = -EINVAL;
+	size = memparse(buffer, &p);
+	if (size < ((u64)PAGES_PER_SECTION << PAGE_SHIFT))
+		goto out;
+	if (*p != '@')
+		goto out;
+	if (strict_strtoull(p + 1, 0, &start) < 0)
+		goto out;
+
+	ret = -ENOMEM;
+	nodes_andnot(*mask, node_possible_map, node_online_map);
+	nid = first_node(*mask);
+	if (nid == MAX_NUMNODES)
+		goto out;
+
+	ret = add_memory(nid, start, size);
+out:
+	NODEMASK_FREE(mask);
+	return ret ? ret : count;
+}
+
+static const struct file_operations add_node_file_ops = {
+	.write		= add_node_store,
+	.llseek		= generic_file_llseek,
+};
+
+static int __init hotplug_debug_init(void)
+{
+	hotplug_debug_root = debugfs_create_dir("hotplug", NULL);
+	if (!hotplug_debug_root)
+		return -ENOMEM;
+
+	if (!debugfs_create_file("add_node", S_IWUSR, hotplug_debug_root,
+			NULL, &add_node_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+module_init(hotplug_debug_init);
+#endif /* CONFIG_DEBUG_FS */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
  2010-12-28  7:34       ` David Rientjes
@ 2010-12-29  2:31         ` Zheng, Shaohui
  -1 siblings, 0 replies; 61+ messages in thread
From: Zheng, Shaohui @ 2010-12-29  2:31 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Li, Haicheng


> -----Original Message-----
> From: David Rientjes [mailto:rientjes@google.com]
> Sent: Tuesday, December 28, 2010 3:35 PM
> To: Zheng, Shaohui; Andrew Morton
> Cc: linux-mm@kvack.org; linux-kernel@vger.kernel.org; haicheng.li@linux.intel.com; lethal@linux-sh.org; Andi Kleen;
> dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Li, Haicheng
> Subject: Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
> 
> 
> Shaohui, I'll reply to this message with an updated version of this patch
> to address Andrew's comments.  You can merge it into your series or Andrew
> can take it seperately (although it doesn't do much good without "x86: add
> numa=possible command line option" unless you have hotpluggable SRAT
> entries and CONFIG_ACPI_NUMA).


Okay, thanks David. I will merge it into my series when I send next version.

Thanks & Regards,
Shaohui

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
@ 2010-12-29  2:31         ` Zheng, Shaohui
  0 siblings, 0 replies; 61+ messages in thread
From: Zheng, Shaohui @ 2010-12-29  2:31 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: linux-mm, linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Li, Haicheng


> -----Original Message-----
> From: David Rientjes [mailto:rientjes@google.com]
> Sent: Tuesday, December 28, 2010 3:35 PM
> To: Zheng, Shaohui; Andrew Morton
> Cc: linux-mm@kvack.org; linux-kernel@vger.kernel.org; haicheng.li@linux.intel.com; lethal@linux-sh.org; Andi Kleen;
> dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Li, Haicheng
> Subject: Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation
> 
> 
> Shaohui, I'll reply to this message with an updated version of this patch
> to address Andrew's comments.  You can merge it into your series or Andrew
> can take it seperately (although it doesn't do much good without "x86: add
> numa=possible command line option" unless you have hotpluggable SRAT
> entries and CONFIG_ACPI_NUMA).


Okay, thanks David. I will merge it into my series when I send next version

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [0/7, v9] NUMA Hotplug Emulator (v9)
  2010-12-10  7:31 ` shaohui.zheng
@ 2011-02-22 22:31   ` David Rientjes
  -1 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2011-02-22 22:31 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman

On Fri, 10 Dec 2010, shaohui.zheng@intel.com wrote:

> v9:
> 
> Solve the bug reported by Eric B Munson, check the return value of cpu_down when do
>  CPU release.
> 
> Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5 based on his
> patch.
> 
> Some small changes on debugfs per-node add_memory interface.
> 

Hi Shaohui,

Tejun's NUMA unification work has been merged into x86/mm, so I think it 
would possible to rebase your hotplug emulator patchset on top of it 
without too many conflicts now.

It should probably be based on x86/mm from 
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [0/7, v9] NUMA Hotplug Emulator (v9)
@ 2011-02-22 22:31   ` David Rientjes
  0 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2011-02-22 22:31 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman

On Fri, 10 Dec 2010, shaohui.zheng@intel.com wrote:

> v9:
> 
> Solve the bug reported by Eric B Munson, check the return value of cpu_down when do
>  CPU release.
> 
> Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5 based on his
> patch.
> 
> Some small changes on debugfs per-node add_memory interface.
> 

Hi Shaohui,

Tejun's NUMA unification work has been merged into x86/mm, so I think it 
would possible to rebase your hotplug emulator patchset on top of it 
without too many conflicts now.

It should probably be based on x86/mm from 
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [0/7, v9] NUMA Hotplug Emulator (v9)
  2011-02-22 22:31   ` David Rientjes
@ 2011-02-23  3:29     ` Haicheng Li
  -1 siblings, 0 replies; 61+ messages in thread
From: Haicheng Li @ 2011-02-23  3:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Shaohui Zheng, Andrew Morton, linux-mm, linux-kernel, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, yang.z.zhang, You,
	Yongkang

Shaohui is out of position recently. Include Yang Zhang and Yongkang You in 
this loop, who are Shaohui's backup.

David Rientjes wrote:
> On Fri, 10 Dec 2010, shaohui.zheng@intel.com wrote:
> 
>> v9:
>>
>> Solve the bug reported by Eric B Munson, check the return value of cpu_down when do
>>  CPU release.
>>
>> Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5 based on his
>> patch.
>>
>> Some small changes on debugfs per-node add_memory interface.
>>
> 
> Hi Shaohui,
> 
> Tejun's NUMA unification work has been merged into x86/mm, so I think it 
> would possible to rebase your hotplug emulator patchset on top of it 
> without too many conflicts now.
> 
> It should probably be based on x86/mm from 
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [0/7, v9] NUMA Hotplug Emulator (v9)
@ 2011-02-23  3:29     ` Haicheng Li
  0 siblings, 0 replies; 61+ messages in thread
From: Haicheng Li @ 2011-02-23  3:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Shaohui Zheng, Andrew Morton, linux-mm, linux-kernel, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, yang.z.zhang, You,
	Yongkang

Shaohui is out of position recently. Include Yang Zhang and Yongkang You in 
this loop, who are Shaohui's backup.

David Rientjes wrote:
> On Fri, 10 Dec 2010, shaohui.zheng@intel.com wrote:
> 
>> v9:
>>
>> Solve the bug reported by Eric B Munson, check the return value of cpu_down when do
>>  CPU release.
>>
>> Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5 based on his
>> patch.
>>
>> Some small changes on debugfs per-node add_memory interface.
>>
> 
> Hi Shaohui,
> 
> Tejun's NUMA unification work has been merged into x86/mm, so I think it 
> would possible to rebase your hotplug emulator patchset on top of it 
> without too many conflicts now.
> 
> It should probably be based on x86/mm from 
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [0/7, v9] NUMA Hotplug Emulator (v9)
  2011-02-23  3:29     ` Haicheng Li
@ 2011-02-23  5:29       ` Zhang, Yang Z
  -1 siblings, 0 replies; 61+ messages in thread
From: Zhang, Yang Z @ 2011-02-23  5:29 UTC (permalink / raw)
  To: Haicheng Li, David Rientjes
  Cc: Zheng, Shaohui, Andrew Morton, linux-mm, linux-kernel, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, You, Yongkang

I am rebasing the patch now. I will send out it when i finish.

best regards
yang


> -----Original Message-----
> From: Haicheng Li [mailto:haicheng.li@linux.intel.com]
> Sent: Wednesday, February 23, 2011 11:30 AM
> To: David Rientjes
> Cc: Zheng, Shaohui; Andrew Morton; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; lethal@linux-sh.org; Andi Kleen;
> dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Zhang, Yang Z; You, Yongkang
> Subject: Re: [0/7, v9] NUMA Hotplug Emulator (v9)
> 
> Shaohui is out of position recently. Include Yang Zhang and Yongkang You in
> this loop, who are Shaohui's backup.
> 
> David Rientjes wrote:
> > On Fri, 10 Dec 2010, shaohui.zheng@intel.com wrote:
> >
> >> v9:
> >>
> >> Solve the bug reported by Eric B Munson, check the return value of
> cpu_down when do
> >>  CPU release.
> >>
> >> Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5
> based on his
> >> patch.
> >>
> >> Some small changes on debugfs per-node add_memory interface.
> >>
> >
> > Hi Shaohui,
> >
> > Tejun's NUMA unification work has been merged into x86/mm, so I think it
> > would possible to rebase your hotplug emulator patchset on top of it
> > without too many conflicts now.
> >
> > It should probably be based on x86/mm from
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [0/7, v9] NUMA Hotplug Emulator (v9)
@ 2011-02-23  5:29       ` Zhang, Yang Z
  0 siblings, 0 replies; 61+ messages in thread
From: Zhang, Yang Z @ 2011-02-23  5:29 UTC (permalink / raw)
  To: Haicheng Li, David Rientjes
  Cc: Zheng, Shaohui, Andrew Morton, linux-mm, linux-kernel, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, You, Yongkang

I am rebasing the patch now. I will send out it when i finish.

best regards
yang


> -----Original Message-----
> From: Haicheng Li [mailto:haicheng.li@linux.intel.com]
> Sent: Wednesday, February 23, 2011 11:30 AM
> To: David Rientjes
> Cc: Zheng, Shaohui; Andrew Morton; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; lethal@linux-sh.org; Andi Kleen;
> dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Zhang, Yang Z; You, Yongkang
> Subject: Re: [0/7, v9] NUMA Hotplug Emulator (v9)
> 
> Shaohui is out of position recently. Include Yang Zhang and Yongkang You in
> this loop, who are Shaohui's backup.
> 
> David Rientjes wrote:
> > On Fri, 10 Dec 2010, shaohui.zheng@intel.com wrote:
> >
> >> v9:
> >>
> >> Solve the bug reported by Eric B Munson, check the return value of
> cpu_down when do
> >>  CPU release.
> >>
> >> Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5
> based on his
> >> patch.
> >>
> >> Some small changes on debugfs per-node add_memory interface.
> >>
> >
> > Hi Shaohui,
> >
> > Tejun's NUMA unification work has been merged into x86/mm, so I think it
> > would possible to rebase your hotplug emulator patchset on top of it
> > without too many conflicts now.
> >
> > It should probably be based on x86/mm from
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2011-02-23  5:30 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-10  7:31 [0/7, v9] NUMA Hotplug Emulator (v9) shaohui.zheng
2010-12-10  7:31 ` shaohui.zheng
2010-12-10  7:31 ` [1/7, v9] NUMA Hotplug Emulator: Documentation shaohui.zheng
2010-12-10  7:31   ` shaohui.zheng
2010-12-10  7:31 ` [2/7, v9] NUMA Hotplug Emulator: Add numa=possible option shaohui.zheng
2010-12-10  7:31   ` shaohui.zheng
2010-12-23  0:27   ` Andrew Morton
2010-12-23  0:27     ` Andrew Morton
2010-12-23  1:14     ` David Rientjes
2010-12-23  1:14       ` David Rientjes
2010-12-10  7:31 ` [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation shaohui.zheng
2010-12-10  7:31   ` shaohui.zheng
2010-12-23  0:27   ` Andrew Morton
2010-12-23  0:27     ` Andrew Morton
2010-12-23  1:38     ` David Rientjes
2010-12-23  1:38       ` David Rientjes
2010-12-23  2:20       ` Andrew Morton
2010-12-23  2:20         ` Andrew Morton
2010-12-28  7:34     ` David Rientjes
2010-12-28  7:34       ` David Rientjes
2010-12-28  7:34       ` [patch] mm: add " David Rientjes
2010-12-28  7:34         ` David Rientjes
2010-12-29  2:31       ` [3/7, v9] NUMA Hotplug Emulator: Add " Zheng, Shaohui
2010-12-29  2:31         ` Zheng, Shaohui
2010-12-10  7:31 ` [4/7, v9] NUMA Hotplug Emulator: Abstract cpu register functions shaohui.zheng
2010-12-10  7:31   ` shaohui.zheng
2010-12-10  7:31 ` [5/7, v9] NUMA Hotplug Emulator: Support cpu probe/release in x86_64 shaohui.zheng
2010-12-10  7:31   ` shaohui.zheng
2010-12-16 16:25   ` Eric B Munson
2010-12-16 23:34     ` Shaohui Zheng
2010-12-16 23:34       ` Shaohui Zheng
2010-12-23  0:27   ` Andrew Morton
2010-12-23  0:27     ` Andrew Morton
2010-12-23  1:34     ` Shaohui Zheng
2010-12-23  1:34       ` Shaohui Zheng
2010-12-23  3:21       ` Andrew Morton
2010-12-23  3:21         ` Andrew Morton
2010-12-23  2:24         ` Shaohui Zheng
2010-12-23  2:24           ` Shaohui Zheng
2010-12-23  5:28           ` Andrew Morton
2010-12-23  5:28             ` Andrew Morton
2010-12-23  4:30             ` Shaohui Zheng
2010-12-23  4:30               ` Shaohui Zheng
2010-12-10  7:31 ` [6/7, v9] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86 shaohui.zheng
2010-12-10  7:31   ` shaohui.zheng
2010-12-23  0:27   ` Andrew Morton
2010-12-23  0:27     ` Andrew Morton
2010-12-23  5:10     ` Shaohui Zheng
2010-12-23  5:10       ` Shaohui Zheng
2010-12-10  7:31 ` [7/7, v9] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface shaohui.zheng
2010-12-10  7:31   ` shaohui.zheng
2010-12-23  0:27   ` Andrew Morton
2010-12-23  0:27     ` Andrew Morton
2010-12-23  2:00     ` Shaohui Zheng
2010-12-23  2:00       ` Shaohui Zheng
2011-02-22 22:31 ` [0/7, v9] NUMA Hotplug Emulator (v9) David Rientjes
2011-02-22 22:31   ` David Rientjes
2011-02-23  3:29   ` Haicheng Li
2011-02-23  3:29     ` Haicheng Li
2011-02-23  5:29     ` Zhang, Yang Z
2011-02-23  5:29       ` Zhang, Yang Z

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.