All of lore.kernel.org
 help / color / mirror / Atom feed
* [0/7,v8] NUMA Hotplug Emulator (v8)
@ 2010-12-07  1:00 ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh

* PATCHSET INTRODUCTION

patch 1: Documentation.
patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
		 as being possible for memory hotplug. 
	    
patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface

patch 4: Abstract cpu register functions, make these interface friend for cpu
		 hotplug emulation
patch 5: Support cpu probe/release in x86, it provide a software method to hot
		 add/remove cpu with sysfs interface.
patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
		 domain to build the incorrect hierarchy.
patch 7: Implement per-node add_memory debugfs interface

* FEEDBACKDS & RESPONSES

v8:

Reconsider David's proposal, accept the per-node add_memory interface on debugfs.
(patch 7).

v7:

David:    We don't need two different interfaces, one in sysfs and one in debugfs,
          to hotplug memory.
Response: We use the debugfs for memory hotplug emulation only, for sysfs memory probe
          interface, we did not do any modifications, so we remove original patch 7
		  from patchset.
David:    Suggest new probe files in debugfs for each online node:
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/node0/add_memory
			/sys/kernel/debug/mem_hotplug/node1/add_memory

Response: We need not make a simple thing such complicated, We'd prefer to
          rename the mem_hotplug/probe interface as mem_hotplug/add_memory.
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)

v6:

Greg KH:  Suggest to use interface mem_hotplug/add_node
David:    Agree with Greg's suggestion
Response: We move the interface from node/add_node to mem_hotplug/add_node, and we also move 
          memory/probe interface to mem_hotplug/probe since both are related to memory hotplug.

Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to patch 1/8.
Response: Move patch 8/8 to patch 1/8, and we will include the full description in 0/8 when
          we send patches in future.	    
       

v5:

David: Suggests to use a flexible method to to do node hotplug emulation. After
       review our 2 versions emulator implemetations, David provides a better solution
	   to solve both the flexibility and memory wasting issue. 
	   
	   Add numa=possible=<N> command line option, provide sysfs inteface
	   /sys/devices/system/node/add_node interface, and move the inteface to debugfs
	   /sys/kernel/debug/hotplug/add_node after hearing the voice from community.

Greg KH: move the interface from hotplug/add_node to node/add_node

Response: Accept David's node=possible=<n> command line options. After talking
       with David, he agree to add his patch to our patchset, thanks David's solution(patch 1).

	   David's original interface /sys/kernel/debug/hotplug/add_node is not so clear for
	   node hotplug emulation, we accept Greg's suggestion, move the interface to ndoe/add_node  
	   (patch 2)
		 
Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us to use configfs replace
       sysfs. After Dave knows that it is just for test purpose, Dave thinks debugfs should
	   be the best.

Response: memory probe sysfs interface already exists, I'd like to still keep it, and extend it
       to support memory add on a specified node(patch 6).

	   We accepts Dave's suggestion, implement memory probe interface with debugfs(patch 7).

Randy Dunlap: Correct many grammatical errors in our documentation(patch 8).

Response: Thanks for Randy's careful review, we already correct them. 

v4: 

Split CPU hotplug emulation code since David has send a patchset for node hotplug emulation.

v3 & v2:

1) Patch 0
Balbir & Greg: Suggest to use tool git/quilt to manage/send the patchset.
Response: Thanks for the recommendation, With help from Fengguang, I get quilt
		  working, it is a great tool.

2) Patch 2
Jaswinder Singh: if (hidden_num) is not required in patch 2
Response: good catching, it is removed in v2.


3) Patch 3
Dave Hansen: Suggest to create a dedicated sysfs file for each possible node.
Greg: 	  How big would this "list" be?  What will it look like exactly?
Haicheng: It should follow "one value per file". It intends to show acceptable
		  parameters.

		  For example, if we have 4 fake offlined nodes, like node 2-5, then:
			   $ cat /sys/devices/system/node/probe
				 2-5

		  Then user hotadds node3 to system:
			   $ echo 3 > /sys/devices/system/node/probe
			   $ cat /sys/devices/system/node/probe
				 2,4-5

Greg:   As you are trying to add a new sysfs file, please create the matching
		Documentation/ABI/ file as well.
Response: We miss it, and we already add it in v2.

Patch 4 & 5: 
Paul Mundt: This looks like an incredibly painful interface. How about scrapping all
of this _emu() mess and just reworking the register_cpu() interface?
Response: accept Paul's suggestion, and remove the cpu _emu functions.

Patch 7: 
Dave Hansen: If we're going to put multiple values into the file now and
		 add to the ABI, can we be more explicit about it?
		echo "physical_address=0x40000000 numa_node=3" > memory/probe
Response: Dave's new interface was accpeted, and more we still keep the old 
	      format for compatibility. We documented the these interfaces into
		  Documentation/ABI in v2.
Greg: 	suggest to use configfs replace for the memory probe interface
Andi: 	This is a debugging interface. It doesn't need to have the
	  	most pretty interface in the world, because it will be only used for
	  	QA by a few people. it's just a QA interface, not the next generation
		of POSIX.
Response: We still keep it as sysfs interface since node/cpu/memory probe interface
		  are all in sysfs, we can create another group of patches to support
		  configfs if we have this strong requirement in future.

v1:

the RFC version for NUMA Hotplug Emulator.

* WHAT IS HOTPLUG EMULATOR 

NUMA hotplug emulator is collectively named for the hotplug emulation
it is able to emulate NUMA Node Hotplug thru a pure software way. It
intends to help people easily debug and test node/cpu/memory hotplug
related stuff on a none-NUMA-hotplug-support machine, even an UMA machine.

The emulator provides mechanism to emulate the process of physcial cpu/mem
hotadd, it provides possibility to debug CPU and memory hotplug on the machines
without NUMA support for kenrel developers. It offers an interface for cpu
and memory hotplug test purpose.

* WHY DO WE USE HOTPLUG EMULATOR

We are focusing on the hotplug emualation for a few months. The emualor helps
 team to reproduce all the major hotplug bugs. It plays an important role to
the hotplug code quality assuirance. Because of the hotplug emulator, we already
move most of the debug working to virtual evironment.

* Principles & Usages 

NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug emulation.

1) Node hotplug emulation:

Adds a numa=possible=<N> command line option to set an additional N nodes as
being possible for memory hotplug. This set of possible nodes control
nr_node_ids and the sizes of several dynamically allocated node arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

For emulation on x86, it would be possible to set aside memory for hotplugged
nodes (say, anything above 2G) and to add an additional four nodes as being
possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/node/node1, in this example. ]

2) CPU hotplug emulation:

The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
hot-add/hot-remove in software method.

When hotplug a CPU with emulator, we are using a logical CPU to emulate the CPU
hotplug process. For the CPU supported SMT, some logical CPUs are in the same
socket, but it may located in different NUMA node after we have emulator.  We
put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only.

 - to hide CPUs
	- Using boot option "maxcpus=N" hide CPUs
	  N is the number of initialize CPUs
	- Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
      when cpu_hpe is enabled, the rest CPUs will not be initialized 

 - to hot-add CPU to node
	# echo nid > cpu/probe

 - to hot-remove CPU
	# echo nid > cpu/release

3) Memory hotplug emulation:

The emulator reserves memory before OS boots, the reserved memory region is
removed from e820 table. Each online node has an add_memory interface, and
memory can be hot-added via the per-ndoe add_memory debugfs interface. 

The difficulty of Memory Release is well-known, we have no plan for it until
now.

 - reserve memory thru a kernel boot paramter
 	mem=1024m

 - add a memory section to node 3
    # echo 0x40000000 > mem_hotplug/node3/add_memory
	OR
    # echo 1024m > mem_hotplug/node3/add_memory

* ACKNOWLEDGMENT 

NUMA Hotplug Emulator includes a team's efforts, thanks all of them.
They are:
Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes and
Yongkang You


Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [0/7,v8] NUMA Hotplug Emulator (v8)
@ 2010-12-07  1:00 ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh

* PATCHSET INTRODUCTION

patch 1: Documentation.
patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
		 as being possible for memory hotplug. 
	    
patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface

patch 4: Abstract cpu register functions, make these interface friend for cpu
		 hotplug emulation
patch 5: Support cpu probe/release in x86, it provide a software method to hot
		 add/remove cpu with sysfs interface.
patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
		 domain to build the incorrect hierarchy.
patch 7: Implement per-node add_memory debugfs interface

* FEEDBACKDS & RESPONSES

v8:

Reconsider David's proposal, accept the per-node add_memory interface on debugfs.
(patch 7).

v7:

David:    We don't need two different interfaces, one in sysfs and one in debugfs,
          to hotplug memory.
Response: We use the debugfs for memory hotplug emulation only, for sysfs memory probe
          interface, we did not do any modifications, so we remove original patch 7
		  from patchset.
David:    Suggest new probe files in debugfs for each online node:
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/node0/add_memory
			/sys/kernel/debug/mem_hotplug/node1/add_memory

Response: We need not make a simple thing such complicated, We'd prefer to
          rename the mem_hotplug/probe interface as mem_hotplug/add_memory.
			/sys/kernel/debug/mem_hotplug/add_node (already exists)
			/sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)

v6:

Greg KH:  Suggest to use interface mem_hotplug/add_node
David:    Agree with Greg's suggestion
Response: We move the interface from node/add_node to mem_hotplug/add_node, and we also move 
          memory/probe interface to mem_hotplug/probe since both are related to memory hotplug.

Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to patch 1/8.
Response: Move patch 8/8 to patch 1/8, and we will include the full description in 0/8 when
          we send patches in future.	    
       

v5:

David: Suggests to use a flexible method to to do node hotplug emulation. After
       review our 2 versions emulator implemetations, David provides a better solution
	   to solve both the flexibility and memory wasting issue. 
	   
	   Add numa=possible=<N> command line option, provide sysfs inteface
	   /sys/devices/system/node/add_node interface, and move the inteface to debugfs
	   /sys/kernel/debug/hotplug/add_node after hearing the voice from community.

Greg KH: move the interface from hotplug/add_node to node/add_node

Response: Accept David's node=possible=<n> command line options. After talking
       with David, he agree to add his patch to our patchset, thanks David's solution(patch 1).

	   David's original interface /sys/kernel/debug/hotplug/add_node is not so clear for
	   node hotplug emulation, we accept Greg's suggestion, move the interface to ndoe/add_node  
	   (patch 2)
		 
Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us to use configfs replace
       sysfs. After Dave knows that it is just for test purpose, Dave thinks debugfs should
	   be the best.

Response: memory probe sysfs interface already exists, I'd like to still keep it, and extend it
       to support memory add on a specified node(patch 6).

	   We accepts Dave's suggestion, implement memory probe interface with debugfs(patch 7).

Randy Dunlap: Correct many grammatical errors in our documentation(patch 8).

Response: Thanks for Randy's careful review, we already correct them. 

v4: 

Split CPU hotplug emulation code since David has send a patchset for node hotplug emulation.

v3 & v2:

1) Patch 0
Balbir & Greg: Suggest to use tool git/quilt to manage/send the patchset.
Response: Thanks for the recommendation, With help from Fengguang, I get quilt
		  working, it is a great tool.

2) Patch 2
Jaswinder Singh: if (hidden_num) is not required in patch 2
Response: good catching, it is removed in v2.


3) Patch 3
Dave Hansen: Suggest to create a dedicated sysfs file for each possible node.
Greg: 	  How big would this "list" be?  What will it look like exactly?
Haicheng: It should follow "one value per file". It intends to show acceptable
		  parameters.

		  For example, if we have 4 fake offlined nodes, like node 2-5, then:
			   $ cat /sys/devices/system/node/probe
				 2-5

		  Then user hotadds node3 to system:
			   $ echo 3 > /sys/devices/system/node/probe
			   $ cat /sys/devices/system/node/probe
				 2,4-5

Greg:   As you are trying to add a new sysfs file, please create the matching
		Documentation/ABI/ file as well.
Response: We miss it, and we already add it in v2.

Patch 4 & 5: 
Paul Mundt: This looks like an incredibly painful interface. How about scrapping all
of this _emu() mess and just reworking the register_cpu() interface?
Response: accept Paul's suggestion, and remove the cpu _emu functions.

Patch 7: 
Dave Hansen: If we're going to put multiple values into the file now and
		 add to the ABI, can we be more explicit about it?
		echo "physical_address=0x40000000 numa_node=3" > memory/probe
Response: Dave's new interface was accpeted, and more we still keep the old 
	      format for compatibility. We documented the these interfaces into
		  Documentation/ABI in v2.
Greg: 	suggest to use configfs replace for the memory probe interface
Andi: 	This is a debugging interface. It doesn't need to have the
	  	most pretty interface in the world, because it will be only used for
	  	QA by a few people. it's just a QA interface, not the next generation
		of POSIX.
Response: We still keep it as sysfs interface since node/cpu/memory probe interface
		  are all in sysfs, we can create another group of patches to support
		  configfs if we have this strong requirement in future.

v1:

the RFC version for NUMA Hotplug Emulator.

* WHAT IS HOTPLUG EMULATOR 

NUMA hotplug emulator is collectively named for the hotplug emulation
it is able to emulate NUMA Node Hotplug thru a pure software way. It
intends to help people easily debug and test node/cpu/memory hotplug
related stuff on a none-NUMA-hotplug-support machine, even an UMA machine.

The emulator provides mechanism to emulate the process of physcial cpu/mem
hotadd, it provides possibility to debug CPU and memory hotplug on the machines
without NUMA support for kenrel developers. It offers an interface for cpu
and memory hotplug test purpose.

* WHY DO WE USE HOTPLUG EMULATOR

We are focusing on the hotplug emualation for a few months. The emualor helps
 team to reproduce all the major hotplug bugs. It plays an important role to
the hotplug code quality assuirance. Because of the hotplug emulator, we already
move most of the debug working to virtual evironment.

* Principles & Usages 

NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug emulation.

1) Node hotplug emulation:

Adds a numa=possible=<N> command line option to set an additional N nodes as
being possible for memory hotplug. This set of possible nodes control
nr_node_ids and the sizes of several dynamically allocated node arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

For emulation on x86, it would be possible to set aside memory for hotplugged
nodes (say, anything above 2G) and to add an additional four nodes as being
possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/node/node1, in this example. ]

2) CPU hotplug emulation:

The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
hot-add/hot-remove in software method.

When hotplug a CPU with emulator, we are using a logical CPU to emulate the CPU
hotplug process. For the CPU supported SMT, some logical CPUs are in the same
socket, but it may located in different NUMA node after we have emulator.  We
put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only.

 - to hide CPUs
	- Using boot option "maxcpus=N" hide CPUs
	  N is the number of initialize CPUs
	- Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
      when cpu_hpe is enabled, the rest CPUs will not be initialized 

 - to hot-add CPU to node
	# echo nid > cpu/probe

 - to hot-remove CPU
	# echo nid > cpu/release

3) Memory hotplug emulation:

The emulator reserves memory before OS boots, the reserved memory region is
removed from e820 table. Each online node has an add_memory interface, and
memory can be hot-added via the per-ndoe add_memory debugfs interface. 

The difficulty of Memory Release is well-known, we have no plan for it until
now.

 - reserve memory thru a kernel boot paramter
 	mem=1024m

 - add a memory section to node 3
    # echo 0x40000000 > mem_hotplug/node3/add_memory
	OR
    # echo 1024m > mem_hotplug/node3/add_memory

* ACKNOWLEDGMENT 

NUMA Hotplug Emulator includes a team's efforts, thanks all of them.
They are:
Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes and
Yongkang You


Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-07  1:00 ` shaohui.zheng
@ 2010-12-07  1:00   ` shaohui.zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 001-hotplug-emulator-doc-x86_64-of-numa-hotplug-emulator.patch --]
[-- Type: text/plain, Size: 4240 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
to explain the usage for the hotplug emulator.

Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt	2010-12-07 08:53:19.677622002 +0800
@@ -0,0 +1,102 @@
+NUMA Hotplug Emulator for x86_64
+---------------------------------------------------
+
+NUMA hotplug emulator is able to emulate NUMA Node Hotplug
+thru a pure software way. It intends to help people easily debug
+and test node/CPU/memory hotplug related stuff on a
+none-NUMA-hotplug-support machine, even a UMA machine and virtual
+environment.
+
+1) Node hotplug emulation:
+
+Adds a numa=possible=<N> command line option to set an additional N nodes
+as being possible for memory hotplug.  This set of possible nodes
+control nr_node_ids and the sizes of several dynamically allocated node
+arrays.
+
+This allows memory hotplug to create new nodes for newly added memory
+rather than binding it to existing nodes.
+
+For emulation on x86, it would be possible to set aside memory for hotplugged
+nodes (say, anything above 2G) and to add an additional four nodes as being
+possible on boot with
+
+	mem=2G numa=possible=4
+
+and then creating a new 128M node at runtime:
+
+	# echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
+	On node 1 totalpages: 0
+	init_memory_mapping: 0000000080000000-0000000088000000
+	 0080000000 - 0088000000 page 2M
+
+Once the new node has been added, its memory can be onlined.  If this
+memory represents memory section 16, for example:
+
+	# echo online > /sys/devices/system/memory/memory16/state
+	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
+	Policy zone: Normal
+ [ The memory section(s) mapped to a particular node are visible via
+   /sys/devices/system/node/node1, in this example. ]
+
+2) CPU hotplug emulation:
+
+The emulator reserves CPUs thru grub parameter, the reserved CPUs can be
+hot-add/hot-remove in software method, it emulates the process of physical
+cpu hotplug.
+
+When hotplugging a CPU with emulator, we are using a logical CPU to emulate the
+CPU socket hotplug process. For the CPU supported SMT, some logical CPUs are in
+the same socket, but it may located in different NUMA node after we have
+emulator. We put the logical CPU into a fake CPU socket, and assign it a
+unique phys_proc_id. For the fake socket, we put one logical CPU in only.
+
+ - to hide CPUs
+	- Using boot option "maxcpus=N" hide CPUs
+	  N is the number of CPUs to initialize; the reset will be hidden.
+	- Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
+      when cpu_hpe is enabled, the rest CPUs will not be initialized
+
+ - to hot-add CPU to node
+	# echo nid > cpu/probe
+
+ - to hot-remove CPU
+	# echo nid > cpu/release
+
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+The difficulty of Memory Release is well-known, we have no plan for it until
+now.
+
+ - reserve memory thru a kernel boot paramter
+ 	mem=1024m
+
+ - add a memory section to node 3
+    # echo 0x40000000 > mem_hotplug/node3/add_memory
+	OR
+    # echo 1024m > mem_hotplug/node3/add_memory
+
+4) Script for hotplug testing
+
+These scripts provides convenience when we hot-add memory/cpu in batch.
+
+- Online all memory sections:
+for m in /sys/devices/system/memory/memory*;
+do
+	echo online > $m/state;
+done
+
+- CPU Online:
+for c in /sys/devices/system/cpu/cpu*;
+do
+	echo 1 > $c/online;
+done
+
+- David Rientjes <rientjes@google.com>
+- Haicheng Li <haicheng.li@intel.com>
+- Shaohui Zheng <shaohui.zheng@intel.com>
+  Nov 2010

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [1/7,v8] NUMA Hotplug Emulator: documentation
@ 2010-12-07  1:00   ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 001-hotplug-emulator-doc-x86_64-of-numa-hotplug-emulator.patch --]
[-- Type: text/plain, Size: 4536 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
to explain the usage for the hotplug emulator.

Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt	2010-12-07 08:53:19.677622002 +0800
@@ -0,0 +1,102 @@
+NUMA Hotplug Emulator for x86_64
+---------------------------------------------------
+
+NUMA hotplug emulator is able to emulate NUMA Node Hotplug
+thru a pure software way. It intends to help people easily debug
+and test node/CPU/memory hotplug related stuff on a
+none-NUMA-hotplug-support machine, even a UMA machine and virtual
+environment.
+
+1) Node hotplug emulation:
+
+Adds a numa=possible=<N> command line option to set an additional N nodes
+as being possible for memory hotplug.  This set of possible nodes
+control nr_node_ids and the sizes of several dynamically allocated node
+arrays.
+
+This allows memory hotplug to create new nodes for newly added memory
+rather than binding it to existing nodes.
+
+For emulation on x86, it would be possible to set aside memory for hotplugged
+nodes (say, anything above 2G) and to add an additional four nodes as being
+possible on boot with
+
+	mem=2G numa=possible=4
+
+and then creating a new 128M node at runtime:
+
+	# echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
+	On node 1 totalpages: 0
+	init_memory_mapping: 0000000080000000-0000000088000000
+	 0080000000 - 0088000000 page 2M
+
+Once the new node has been added, its memory can be onlined.  If this
+memory represents memory section 16, for example:
+
+	# echo online > /sys/devices/system/memory/memory16/state
+	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
+	Policy zone: Normal
+ [ The memory section(s) mapped to a particular node are visible via
+   /sys/devices/system/node/node1, in this example. ]
+
+2) CPU hotplug emulation:
+
+The emulator reserves CPUs thru grub parameter, the reserved CPUs can be
+hot-add/hot-remove in software method, it emulates the process of physical
+cpu hotplug.
+
+When hotplugging a CPU with emulator, we are using a logical CPU to emulate the
+CPU socket hotplug process. For the CPU supported SMT, some logical CPUs are in
+the same socket, but it may located in different NUMA node after we have
+emulator. We put the logical CPU into a fake CPU socket, and assign it a
+unique phys_proc_id. For the fake socket, we put one logical CPU in only.
+
+ - to hide CPUs
+	- Using boot option "maxcpus=N" hide CPUs
+	  N is the number of CPUs to initialize; the reset will be hidden.
+	- Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
+      when cpu_hpe is enabled, the rest CPUs will not be initialized
+
+ - to hot-add CPU to node
+	# echo nid > cpu/probe
+
+ - to hot-remove CPU
+	# echo nid > cpu/release
+
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+The difficulty of Memory Release is well-known, we have no plan for it until
+now.
+
+ - reserve memory thru a kernel boot paramter
+ 	mem=1024m
+
+ - add a memory section to node 3
+    # echo 0x40000000 > mem_hotplug/node3/add_memory
+	OR
+    # echo 1024m > mem_hotplug/node3/add_memory
+
+4) Script for hotplug testing
+
+These scripts provides convenience when we hot-add memory/cpu in batch.
+
+- Online all memory sections:
+for m in /sys/devices/system/memory/memory*;
+do
+	echo online > $m/state;
+done
+
+- CPU Online:
+for c in /sys/devices/system/cpu/cpu*;
+do
+	echo 1 > $c/online;
+done
+
+- David Rientjes <rientjes@google.com>
+- Haicheng Li <haicheng.li@intel.com>
+- Shaohui Zheng <shaohui.zheng@intel.com>
+  Nov 2010

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [2/7,v8] NUMA Hotplug Emulator: Add numa=possible option
  2010-12-07  1:00 ` shaohui.zheng
@ 2010-12-07  1:00   ` shaohui.zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 002-add-node-possible-option.patch --]
[-- Type: text/plain, Size: 3341 bytes --]

From:  David Rientjes <rientjes@google.com>

Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug.  This set of possible nodes
controls nr_node_ids and the sizes of several dynamically allocated node
arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

The first use-case for this will be node hotplug emulation which will use
these possible nodes to create new nodes to test the memory hotplug
callbacks and surrounding memory hotplug code.

CC: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/x86/x86_64/boot-options.txt |    4 ++++
 arch/x86/mm/numa_64.c                     |   18 +++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -174,6 +174,10 @@ NUMA
 		If given as an integer, fills all system RAM with N fake nodes
 		interleaved over physical nodes.
 
+  numa=possible=<N>
+		Sets an additional N nodes as being possible for memory
+		hotplug.
+
 ACPI
 
   acpi=off	Don't enable ACPI
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -33,6 +33,7 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
+static unsigned long __initdata numa_possible_nodes;
 
 /*
  * Map cpu index to node index
@@ -611,7 +612,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 
 #ifdef CONFIG_NUMA_EMU
 	if (cmdline && !numa_emulation(start_pfn, last_pfn, acpi, k8))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -619,14 +620,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 #ifdef CONFIG_ACPI_NUMA
 	if (!numa_off && acpi && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
 						  last_pfn << PAGE_SHIFT))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
 
 #ifdef CONFIG_K8_NUMA
 	if (!numa_off && k8 && !k8_scan_nodes())
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 		numa_set_node(i, 0);
 	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+out: __maybe_unused
+	for (i = 0; i < numa_possible_nodes; i++) {
+		int nid;
+
+		nid = first_unset_node(node_possible_map);
+		if (nid == MAX_NUMNODES)
+			break;
+		node_set(nid, node_possible_map);
+	}
 }
 
 unsigned long __init numa_free_all_bootmem(void)
@@ -675,6 +685,8 @@ static __init int numa_setup(char *opt)
 	if (!strncmp(opt, "noacpi", 6))
 		acpi_numa = -1;
 #endif
+	if (!strncmp(opt, "possible=", 9))
+		numa_possible_nodes = simple_strtoul(opt + 9, NULL, 0);
 	return 0;
 }
 early_param("numa", numa_setup);

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [2/7,v8] NUMA Hotplug Emulator: Add numa=possible option
@ 2010-12-07  1:00   ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 002-add-node-possible-option.patch --]
[-- Type: text/plain, Size: 3637 bytes --]

From:  David Rientjes <rientjes@google.com>

Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug.  This set of possible nodes
controls nr_node_ids and the sizes of several dynamically allocated node
arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

The first use-case for this will be node hotplug emulation which will use
these possible nodes to create new nodes to test the memory hotplug
callbacks and surrounding memory hotplug code.

CC: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/x86/x86_64/boot-options.txt |    4 ++++
 arch/x86/mm/numa_64.c                     |   18 +++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -174,6 +174,10 @@ NUMA
 		If given as an integer, fills all system RAM with N fake nodes
 		interleaved over physical nodes.
 
+  numa=possible=<N>
+		Sets an additional N nodes as being possible for memory
+		hotplug.
+
 ACPI
 
   acpi=off	Don't enable ACPI
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -33,6 +33,7 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
+static unsigned long __initdata numa_possible_nodes;
 
 /*
  * Map cpu index to node index
@@ -611,7 +612,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 
 #ifdef CONFIG_NUMA_EMU
 	if (cmdline && !numa_emulation(start_pfn, last_pfn, acpi, k8))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -619,14 +620,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 #ifdef CONFIG_ACPI_NUMA
 	if (!numa_off && acpi && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
 						  last_pfn << PAGE_SHIFT))
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
 
 #ifdef CONFIG_K8_NUMA
 	if (!numa_off && k8 && !k8_scan_nodes())
-		return;
+		goto out;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
@@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 		numa_set_node(i, 0);
 	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+out: __maybe_unused
+	for (i = 0; i < numa_possible_nodes; i++) {
+		int nid;
+
+		nid = first_unset_node(node_possible_map);
+		if (nid == MAX_NUMNODES)
+			break;
+		node_set(nid, node_possible_map);
+	}
 }
 
 unsigned long __init numa_free_all_bootmem(void)
@@ -675,6 +685,8 @@ static __init int numa_setup(char *opt)
 	if (!strncmp(opt, "noacpi", 6))
 		acpi_numa = -1;
 #endif
+	if (!strncmp(opt, "possible=", 9))
+		numa_possible_nodes = simple_strtoul(opt + 9, NULL, 0);
 	return 0;
 }
 early_param("numa", numa_setup);

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [3/7,v8] NUMA Hotplug Emulator: Add node hotplug emulation
  2010-12-07  1:00 ` shaohui.zheng
@ 2010-12-07  1:00   ` shaohui.zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 003-node-hotpluge-emulation.patch --]
[-- Type: text/plain, Size: 5483 bytes --]

From: David Rientjes <rientjes@google.com>

Add an interface to allow new nodes to be added when performing memory
hot-add.  This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.

This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
that behaves in a similar way to the memory hot-add "probe" interface.
Its format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.

The new node id is a currently offline, but possible, node.  The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.

For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional four nodes as being possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M
Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/node/node1, in this example. ]

The new node is now hotplugged and ready for testing.

CC: Haicheng Li <haicheng.li@intel.com>
CC: Greg KH <gregkh@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/memory-hotplug.txt |   24 +++++++++++++++
 mm/memory_hotplug.c              |   59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 0 deletions(-)
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-11-30 12:40:43.527622001 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-11-30 14:11:11.827622000 +0800
@@ -18,6 +18,7 @@
 4. Physical memory hot-add phase
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
+  4.3 Node hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -215,6 +216,29 @@
 Please see "How to online memory" in this text.
 
 
+4.3 Node hotplug emulation
+------------
+With debugfs, it is possible to test node hotplug by assigning the newly
+added memory to a new node id when using a different interface with a similar
+behavior to "probe" described in section 4.2.  If a node id is possible
+(there are bits in /sys/devices/system/memory/possible that are not online),
+then it may be used to emulate a newly added node as the result of memory
+hotplug by using the debugfs "add_node" interface.
+
+The add_node interface is located at "mem_hotplug/add_node" at the debugfs
+mount point.
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G").  The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
 
 ------------------------------
 5. Logical Memory hot-add phase
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
@@ -924,3 +924,63 @@
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *memhp_debug_root;
+
+static ssize_t add_node_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	nodemask_t mask;
+	u64 start, size;
+	char buffer[64];
+	char *p;
+	int nid;
+	int ret;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	size = memparse(buffer, &p);
+	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
+		return -EINVAL;
+	if (*p != '@')
+		return -EINVAL;
+
+	start = simple_strtoull(p + 1, NULL, 0);
+
+	nodes_andnot(mask, node_possible_map, node_online_map);
+	nid = first_node(mask);
+	if (nid == MAX_NUMNODES)
+		return -ENOMEM;
+
+	ret = add_memory(nid, start, size);
+	return ret ? ret : count;
+}
+
+static const struct file_operations add_node_file_ops = {
+	.write		= add_node_store,
+	.llseek		= generic_file_llseek,
+};
+
+static int __init node_debug_init(void)
+{
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
+			NULL, &add_node_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+module_init(node_debug_init);
+#endif /* CONFIG_DEBUG_FS */

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [3/7,v8] NUMA Hotplug Emulator: Add node hotplug emulation
@ 2010-12-07  1:00   ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 003-node-hotpluge-emulation.patch --]
[-- Type: text/plain, Size: 5779 bytes --]

From: David Rientjes <rientjes@google.com>

Add an interface to allow new nodes to be added when performing memory
hot-add.  This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.

This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
that behaves in a similar way to the memory hot-add "probe" interface.
Its format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.

The new node id is a currently offline, but possible, node.  The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.

For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional four nodes as being possible on boot with

	mem=2G numa=possible=4

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M
Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal
 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/node/node1, in this example. ]

The new node is now hotplugged and ready for testing.

CC: Haicheng Li <haicheng.li@intel.com>
CC: Greg KH <gregkh@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
 Documentation/memory-hotplug.txt |   24 +++++++++++++++
 mm/memory_hotplug.c              |   59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 0 deletions(-)
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-11-30 12:40:43.527622001 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-11-30 14:11:11.827622000 +0800
@@ -18,6 +18,7 @@
 4. Physical memory hot-add phase
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
+  4.3 Node hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -215,6 +216,29 @@
 Please see "How to online memory" in this text.
 
 
+4.3 Node hotplug emulation
+------------
+With debugfs, it is possible to test node hotplug by assigning the newly
+added memory to a new node id when using a different interface with a similar
+behavior to "probe" described in section 4.2.  If a node id is possible
+(there are bits in /sys/devices/system/memory/possible that are not online),
+then it may be used to emulate a newly added node as the result of memory
+hotplug by using the debugfs "add_node" interface.
+
+The add_node interface is located at "mem_hotplug/add_node" at the debugfs
+mount point.
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G").  The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
 
 ------------------------------
 5. Logical Memory hot-add phase
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
@@ -924,3 +924,63 @@
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *memhp_debug_root;
+
+static ssize_t add_node_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	nodemask_t mask;
+	u64 start, size;
+	char buffer[64];
+	char *p;
+	int nid;
+	int ret;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	size = memparse(buffer, &p);
+	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
+		return -EINVAL;
+	if (*p != '@')
+		return -EINVAL;
+
+	start = simple_strtoull(p + 1, NULL, 0);
+
+	nodes_andnot(mask, node_possible_map, node_online_map);
+	nid = first_node(mask);
+	if (nid == MAX_NUMNODES)
+		return -ENOMEM;
+
+	ret = add_memory(nid, start, size);
+	return ret ? ret : count;
+}
+
+static const struct file_operations add_node_file_ops = {
+	.write		= add_node_store,
+	.llseek		= generic_file_llseek,
+};
+
+static int __init node_debug_init(void)
+{
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
+			NULL, &add_node_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+module_init(node_debug_init);
+#endif /* CONFIG_DEBUG_FS */

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [4/7,v8] NUMA Hotplug Emulator: Abstract cpu register functions
  2010-12-07  1:00 ` shaohui.zheng
@ 2010-12-07  1:00   ` shaohui.zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Shaohui Zheng

[-- Attachment #1: 004-hotplug-emulator-x86-abstract-cpu-register-functions.patch --]
[-- Type: text/plain, Size: 3359 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

Abstract cpu register functions, provide a more flexible interface
register_cpu_node, the new interface provides convenience to add cpu
to a specified node, we can use it to add a cpu to a fake node.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/cpu.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/cpu.h	2010-11-17 09:00:59.742608402 +0800
+++ linux-hpe4/arch/x86/include/asm/cpu.h	2010-11-17 09:01:10.192838977 +0800
@@ -27,6 +27,7 @@
 
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
+extern int arch_register_cpu_node(int num, int nid);
 extern void arch_unregister_cpu(int);
 #endif
 
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 10:05:32.934085248 +0800
@@ -52,6 +52,15 @@
 }
 EXPORT_SYMBOL(arch_register_cpu);
 
+int __ref arch_register_cpu_node(int num, int nid)
+{
+	if (num)
+		per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
+
+	return register_cpu_node(&per_cpu(cpu_devices, num).cpu, num, nid);
+}
+EXPORT_SYMBOL(arch_register_cpu_node);
+
 void arch_unregister_cpu(int num)
 {
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-11-17 10:05:32.943465010 +0800
@@ -208,17 +208,18 @@
 static SYSDEV_CLASS_ATTR(offline, 0444, print_cpus_offline, NULL);
 
 /*
- * register_cpu - Setup a sysfs device for a CPU.
+ * register_cpu_node - Setup a sysfs device for a CPU.
  * @cpu - cpu->hotpluggable field set to 1 will generate a control file in
  *	  sysfs for this CPU.
  * @num - CPU number to use when creating the device.
+ * @nid - Node ID to use, if any.
  *
  * Initialize and register the CPU device.
  */
-int __cpuinit register_cpu(struct cpu *cpu, int num)
+int __cpuinit register_cpu_node(struct cpu *cpu, int num, int nid)
 {
 	int error;
-	cpu->node_id = cpu_to_node(num);
+	cpu->node_id = nid;
 	cpu->sysdev.id = num;
 	cpu->sysdev.cls = &cpu_sysdev_class;
 
@@ -229,7 +230,7 @@
 	if (!error)
 		per_cpu(cpu_sys_devices, num) = &cpu->sysdev;
 	if (!error)
-		register_cpu_under_node(num, cpu_to_node(num));
+		register_cpu_under_node(num, nid);
 
 #ifdef CONFIG_KEXEC
 	if (!error)
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-11-17 09:00:59.772898926 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-11-17 10:05:32.954085309 +0800
@@ -30,7 +30,13 @@
 	struct sys_device sysdev;
 };
 
-extern int register_cpu(struct cpu *cpu, int num);
+extern int register_cpu_node(struct cpu *cpu, int num, int nid);
+
+static inline int register_cpu(struct cpu *cpu, int num)
+{
+	return register_cpu_node(cpu, num, cpu_to_node(num));
+}
+
 extern struct sys_device *get_cpu_sysdev(unsigned cpu);
 
 extern int cpu_add_sysdev_attr(struct sysdev_attribute *attr);

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [4/7,v8] NUMA Hotplug Emulator: Abstract cpu register functions
@ 2010-12-07  1:00   ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Shaohui Zheng

[-- Attachment #1: 004-hotplug-emulator-x86-abstract-cpu-register-functions.patch --]
[-- Type: text/plain, Size: 3655 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

Abstract cpu register functions, provide a more flexible interface
register_cpu_node, the new interface provides convenience to add cpu
to a specified node, we can use it to add a cpu to a fake node.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/cpu.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/cpu.h	2010-11-17 09:00:59.742608402 +0800
+++ linux-hpe4/arch/x86/include/asm/cpu.h	2010-11-17 09:01:10.192838977 +0800
@@ -27,6 +27,7 @@
 
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
+extern int arch_register_cpu_node(int num, int nid);
 extern void arch_unregister_cpu(int);
 #endif
 
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 10:05:32.934085248 +0800
@@ -52,6 +52,15 @@
 }
 EXPORT_SYMBOL(arch_register_cpu);
 
+int __ref arch_register_cpu_node(int num, int nid)
+{
+	if (num)
+		per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
+
+	return register_cpu_node(&per_cpu(cpu_devices, num).cpu, num, nid);
+}
+EXPORT_SYMBOL(arch_register_cpu_node);
+
 void arch_unregister_cpu(int num)
 {
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-11-17 10:05:32.943465010 +0800
@@ -208,17 +208,18 @@
 static SYSDEV_CLASS_ATTR(offline, 0444, print_cpus_offline, NULL);
 
 /*
- * register_cpu - Setup a sysfs device for a CPU.
+ * register_cpu_node - Setup a sysfs device for a CPU.
  * @cpu - cpu->hotpluggable field set to 1 will generate a control file in
  *	  sysfs for this CPU.
  * @num - CPU number to use when creating the device.
+ * @nid - Node ID to use, if any.
  *
  * Initialize and register the CPU device.
  */
-int __cpuinit register_cpu(struct cpu *cpu, int num)
+int __cpuinit register_cpu_node(struct cpu *cpu, int num, int nid)
 {
 	int error;
-	cpu->node_id = cpu_to_node(num);
+	cpu->node_id = nid;
 	cpu->sysdev.id = num;
 	cpu->sysdev.cls = &cpu_sysdev_class;
 
@@ -229,7 +230,7 @@
 	if (!error)
 		per_cpu(cpu_sys_devices, num) = &cpu->sysdev;
 	if (!error)
-		register_cpu_under_node(num, cpu_to_node(num));
+		register_cpu_under_node(num, nid);
 
 #ifdef CONFIG_KEXEC
 	if (!error)
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-11-17 09:00:59.772898926 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-11-17 10:05:32.954085309 +0800
@@ -30,7 +30,13 @@
 	struct sys_device sysdev;
 };
 
-extern int register_cpu(struct cpu *cpu, int num);
+extern int register_cpu_node(struct cpu *cpu, int num, int nid);
+
+static inline int register_cpu(struct cpu *cpu, int num)
+{
+	return register_cpu_node(cpu, num, cpu_to_node(num));
+}
+
 extern struct sys_device *get_cpu_sysdev(unsigned cpu);
 
 extern int cpu_add_sysdev_attr(struct sysdev_attribute *attr);

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-07  1:00 ` shaohui.zheng
@ 2010-12-07  1:00   ` shaohui.zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu, Shaohui Zheng,
	Haicheng Li

[-- Attachment #1: 005-hotplug-emulator-x86-support-cpu-probe-release-in-x86.patch --]
[-- Type: text/plain, Size: 10688 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

CPU physical hot-add/hot-remove are supported on some hardwares, and it 
was already supported in current linux kernel. NUMA Hotplug Emulator provides
a mechanism to emulate the process with software method. It can be used for
testing or debuging purpose.

CPU physical hotplug is different with logical CPU online/offline. Logical
online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
hotplug emulator uses probe/release interface. It becomes possible to do cpu
hotplug automation and stress

Add cpu interface probe/release under sysfs for x86_64. User can use this
interface to emulate the cpu hot-add and hot-remove process.

Directive:
*) Reserve CPU thru grub parameter like:
	maxcpus=4

the rest CPUs will not be initiliazed. 

*) Probe CPU
we can use the probe interface to hot-add new CPUs:
	echo nid > /sys/devices/system/cpu/probe

*) Release a CPU
	echo cpu > /sys/devices/system/cpu/release

A reserved CPU will be hot-added to the specified node.
1) nid == 0, the CPU will be added to the real node which the CPU
should be in
2) nid != 0, add the CPU to node nid even through it is a fake node.

CC: Ingo Molnar <mingo@elte.hu>
CC: Len Brown <len.brown@intel.com>
CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
---
Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:40.287725018 +0800
+++ linux-hpe4/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:53.277724996 +0800
@@ -647,8 +647,44 @@
 }
 EXPORT_SYMBOL(acpi_map_lsapic);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
+{
+#ifdef CONFIG_ACPI_NUMA
+#ifdef CONFIG_X86_64
+	apicid_to_node[physid] = nid;
+	numa_set_node(cpu, nid);
+#else /* CONFIG_X86_32 */
+	apicid_2_node[physid] = nid;
+	cpu_to_node_map[cpu] = nid;
+#endif
+#endif
+}
+
+static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
+int __ref acpi_map_lsapic_emu(int pcpu, int nid)
+{
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[pcpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
+		cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
+
+	per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
+	acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
+
+	return pcpu;
+}
+EXPORT_SYMBOL(acpi_map_lsapic_emu);
+#endif
+
 int acpi_unmap_lsapic(int cpu)
 {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[cpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
+		cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
+#endif
 	per_cpu(x86_cpu_to_apicid, cpu) = -1;
 	set_cpu_present(cpu, false);
 	num_processors--;
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-26 09:24:40.297724969 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-26 12:48:58.977725001 +0800
@@ -107,8 +107,6 @@
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
 
-ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
-ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-26 12:48:58.987725001 +0800
@@ -30,6 +30,9 @@
 #include <linux/init.h>
 #include <linux/smp.h>
 #include <asm/cpu.h>
+#include <linux/cpu.h>
+#include <linux/topology.h>
+#include <linux/acpi.h>
 
 static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
 
@@ -66,6 +69,74 @@
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
+
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+	int nid = 0;
+	int num = 0, selected = 0;
+
+	/* check parameters */
+	if (!buf || count < 2)
+		return -EPERM;
+
+	nid = simple_strtoul(buf, NULL, 0);
+	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
+
+	if (nid < 0 || nid > nr_node_ids - 1) {
+		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
+			nid, nr_node_ids);
+		return -EPERM;
+	}
+
+	if (!node_online(nid)) {
+		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
+		return -EPERM;
+	}
+
+	/* find first uninitialized cpu */
+	for_each_present_cpu(num) {
+		if (per_cpu(cpu_sys_devices, num) == NULL) {
+			selected = num;
+			break;
+		}
+	}
+
+	if (selected >= num_possible_cpus()) {
+		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
+		return -EPERM;
+	}
+
+	/* register cpu */
+	arch_register_cpu_node(selected, nid);
+	acpi_map_lsapic_emu(selected, nid);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_probe);
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+	int cpu = 0;
+
+	cpu =  simple_strtoul(buf, NULL, 0);
+	/* cpu 0 is not hotplugable */
+	if (cpu == 0) {
+		printk(KERN_ERR "can not release cpu 0.\n");
+		return -EPERM;
+	}
+
+	if (cpu_online(cpu)) {
+		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
+		cpu_down(cpu);
+	}
+
+	arch_unregister_cpu(cpu);
+	acpi_unmap_lsapic(cpu);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_release);
+
 #else /* CONFIG_HOTPLUG_CPU */
 
 static int __init arch_register_cpu(int num)
@@ -83,8 +154,14 @@
 		register_one_node(i);
 #endif
 
-	for_each_present_cpu(i)
-		arch_register_cpu(i);
+	/*
+	 * when cpu hotplug emulation enabled, register the online cpu only,
+	 * the rests are reserved for cpu probe.
+	 */
+	for_each_present_cpu(i) {
+		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
+			arch_register_cpu(i);
+	}
 
 	return 0;
 }
Index: linux-hpe4/arch/x86/mm/numa_64.c
===================================================================
--- linux-hpe4.orig/arch/x86/mm/numa_64.c	2010-11-26 09:24:40.317724965 +0800
+++ linux-hpe4/arch/x86/mm/numa_64.c	2010-11-26 09:24:53.297725001 +0800
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -785,6 +786,19 @@
 }
 #endif
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static __init int cpu_hpe_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+
+	if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
+		cpu_hpe_on = 1;
+
+	return 0;
+}
+early_param("cpu_hpe", cpu_hpe_setup);
+#endif  /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
 void __cpuinit numa_set_node(int cpu, int node)
 {
Index: linux-hpe4/drivers/acpi/processor_driver.c
===================================================================
--- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-11-26 09:24:40.327725004 +0800
+++ linux-hpe4/drivers/acpi/processor_driver.c	2010-11-26 09:24:53.297725001 +0800
@@ -530,6 +530,14 @@
 		goto err_free_cpumask;
 
 	sysdev = get_cpu_sysdev(pr->id);
+	/*
+	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
+	 * throu the cpu probe interface. Return directly.
+	 */
+	if (sysdev == NULL) {
+		goto out;
+	}
+
 	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
 		result = -EFAULT;
 		goto err_remove_fs;
@@ -570,6 +578,7 @@
 		goto err_remove_sysfs;
 	}
 
+out:
 	return 0;
 
 err_remove_sysfs:
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-11-26 09:24:53.297725001 +0800
@@ -22,9 +22,15 @@
 };
 EXPORT_SYMBOL(cpu_sysdev_class);
 
-static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
+DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
 #ifdef CONFIG_HOTPLUG_CPU
+/*
+ * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
+ * disabled in default, we can enable it throu grub parameter cpu_hpe=on
+ */
+int cpu_hpe_on;
+
 static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
 			   char *buf)
 {
Index: linux-hpe4/include/linux/acpi.h
===================================================================
--- linux-hpe4.orig/include/linux/acpi.h	2010-11-26 09:24:40.347725041 +0800
+++ linux-hpe4/include/linux/acpi.h	2010-11-26 09:24:53.297725001 +0800
@@ -102,6 +102,7 @@
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_lsapic(acpi_handle handle, int *pcpu);
+int acpi_map_lsapic_emu(int pcpu, int nid);
 int acpi_unmap_lsapic(int cpu);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-11-26 09:24:53.297725001 +0800
@@ -30,6 +30,8 @@
 	struct sys_device sysdev;
 };
 
+DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
+
 extern int register_cpu_node(struct cpu *cpu, int num, int nid);
 
 static inline int register_cpu(struct cpu *cpu, int num)
@@ -149,6 +151,7 @@
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
 int cpu_down(unsigned int cpu);
+extern int cpu_hpe_on;
 
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 extern void cpu_hotplug_driver_lock(void);
@@ -171,6 +174,7 @@
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
 #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
+static int cpu_hpe_on;
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_PM_SLEEP_SMP
Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
===================================================================
--- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:49:44.847725099 +0800
+++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:55:50.527724999 +0800
@@ -316,3 +316,9 @@
 		Do not use GB pages for kernel direct mappings.
 	gbpages
 		Use GB pages for kernel direct mappings.
+	cpu_hpe=on/off
+		Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
+		sysfs provides probe/release interface to hot add/remove CPUs dynamically.
+		We can use maxcpus=<N> to reserve CPUs.
+		This option is disabled by default.
+			

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-07  1:00   ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu, Shaohui Zheng,
	Haicheng Li

[-- Attachment #1: 005-hotplug-emulator-x86-support-cpu-probe-release-in-x86.patch --]
[-- Type: text/plain, Size: 10984 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

CPU physical hot-add/hot-remove are supported on some hardwares, and it 
was already supported in current linux kernel. NUMA Hotplug Emulator provides
a mechanism to emulate the process with software method. It can be used for
testing or debuging purpose.

CPU physical hotplug is different with logical CPU online/offline. Logical
online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
hotplug emulator uses probe/release interface. It becomes possible to do cpu
hotplug automation and stress

Add cpu interface probe/release under sysfs for x86_64. User can use this
interface to emulate the cpu hot-add and hot-remove process.

Directive:
*) Reserve CPU thru grub parameter like:
	maxcpus=4

the rest CPUs will not be initiliazed. 

*) Probe CPU
we can use the probe interface to hot-add new CPUs:
	echo nid > /sys/devices/system/cpu/probe

*) Release a CPU
	echo cpu > /sys/devices/system/cpu/release

A reserved CPU will be hot-added to the specified node.
1) nid == 0, the CPU will be added to the real node which the CPU
should be in
2) nid != 0, add the CPU to node nid even through it is a fake node.

CC: Ingo Molnar <mingo@elte.hu>
CC: Len Brown <len.brown@intel.com>
CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
---
Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:40.287725018 +0800
+++ linux-hpe4/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:53.277724996 +0800
@@ -647,8 +647,44 @@
 }
 EXPORT_SYMBOL(acpi_map_lsapic);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
+{
+#ifdef CONFIG_ACPI_NUMA
+#ifdef CONFIG_X86_64
+	apicid_to_node[physid] = nid;
+	numa_set_node(cpu, nid);
+#else /* CONFIG_X86_32 */
+	apicid_2_node[physid] = nid;
+	cpu_to_node_map[cpu] = nid;
+#endif
+#endif
+}
+
+static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
+int __ref acpi_map_lsapic_emu(int pcpu, int nid)
+{
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[pcpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
+		cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
+
+	per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
+	acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
+
+	return pcpu;
+}
+EXPORT_SYMBOL(acpi_map_lsapic_emu);
+#endif
+
 int acpi_unmap_lsapic(int cpu)
 {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/* backup cpu apicid to array cpu_to_apicid_saved */
+	if (cpu_to_apicid_saved[cpu] == 0 &&
+		per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
+		cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
+#endif
 	per_cpu(x86_cpu_to_apicid, cpu) = -1;
 	set_cpu_present(cpu, false);
 	num_processors--;
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-26 09:24:40.297724969 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-26 12:48:58.977725001 +0800
@@ -107,8 +107,6 @@
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
 
-ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
-ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-26 12:48:58.987725001 +0800
@@ -30,6 +30,9 @@
 #include <linux/init.h>
 #include <linux/smp.h>
 #include <asm/cpu.h>
+#include <linux/cpu.h>
+#include <linux/topology.h>
+#include <linux/acpi.h>
 
 static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
 
@@ -66,6 +69,74 @@
 	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
+
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+	int nid = 0;
+	int num = 0, selected = 0;
+
+	/* check parameters */
+	if (!buf || count < 2)
+		return -EPERM;
+
+	nid = simple_strtoul(buf, NULL, 0);
+	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
+
+	if (nid < 0 || nid > nr_node_ids - 1) {
+		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
+			nid, nr_node_ids);
+		return -EPERM;
+	}
+
+	if (!node_online(nid)) {
+		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
+		return -EPERM;
+	}
+
+	/* find first uninitialized cpu */
+	for_each_present_cpu(num) {
+		if (per_cpu(cpu_sys_devices, num) == NULL) {
+			selected = num;
+			break;
+		}
+	}
+
+	if (selected >= num_possible_cpus()) {
+		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
+		return -EPERM;
+	}
+
+	/* register cpu */
+	arch_register_cpu_node(selected, nid);
+	acpi_map_lsapic_emu(selected, nid);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_probe);
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+	int cpu = 0;
+
+	cpu =  simple_strtoul(buf, NULL, 0);
+	/* cpu 0 is not hotplugable */
+	if (cpu == 0) {
+		printk(KERN_ERR "can not release cpu 0.\n");
+		return -EPERM;
+	}
+
+	if (cpu_online(cpu)) {
+		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
+		cpu_down(cpu);
+	}
+
+	arch_unregister_cpu(cpu);
+	acpi_unmap_lsapic(cpu);
+
+	return count;
+}
+EXPORT_SYMBOL(arch_cpu_release);
+
 #else /* CONFIG_HOTPLUG_CPU */
 
 static int __init arch_register_cpu(int num)
@@ -83,8 +154,14 @@
 		register_one_node(i);
 #endif
 
-	for_each_present_cpu(i)
-		arch_register_cpu(i);
+	/*
+	 * when cpu hotplug emulation enabled, register the online cpu only,
+	 * the rests are reserved for cpu probe.
+	 */
+	for_each_present_cpu(i) {
+		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
+			arch_register_cpu(i);
+	}
 
 	return 0;
 }
Index: linux-hpe4/arch/x86/mm/numa_64.c
===================================================================
--- linux-hpe4.orig/arch/x86/mm/numa_64.c	2010-11-26 09:24:40.317724965 +0800
+++ linux-hpe4/arch/x86/mm/numa_64.c	2010-11-26 09:24:53.297725001 +0800
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -785,6 +786,19 @@
 }
 #endif
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static __init int cpu_hpe_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+
+	if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
+		cpu_hpe_on = 1;
+
+	return 0;
+}
+early_param("cpu_hpe", cpu_hpe_setup);
+#endif  /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
 void __cpuinit numa_set_node(int cpu, int node)
 {
Index: linux-hpe4/drivers/acpi/processor_driver.c
===================================================================
--- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-11-26 09:24:40.327725004 +0800
+++ linux-hpe4/drivers/acpi/processor_driver.c	2010-11-26 09:24:53.297725001 +0800
@@ -530,6 +530,14 @@
 		goto err_free_cpumask;
 
 	sysdev = get_cpu_sysdev(pr->id);
+	/*
+	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
+	 * throu the cpu probe interface. Return directly.
+	 */
+	if (sysdev == NULL) {
+		goto out;
+	}
+
 	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
 		result = -EFAULT;
 		goto err_remove_fs;
@@ -570,6 +578,7 @@
 		goto err_remove_sysfs;
 	}
 
+out:
 	return 0;
 
 err_remove_sysfs:
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c	2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/drivers/base/cpu.c	2010-11-26 09:24:53.297725001 +0800
@@ -22,9 +22,15 @@
 };
 EXPORT_SYMBOL(cpu_sysdev_class);
 
-static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
+DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
 #ifdef CONFIG_HOTPLUG_CPU
+/*
+ * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
+ * disabled in default, we can enable it throu grub parameter cpu_hpe=on
+ */
+int cpu_hpe_on;
+
 static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
 			   char *buf)
 {
Index: linux-hpe4/include/linux/acpi.h
===================================================================
--- linux-hpe4.orig/include/linux/acpi.h	2010-11-26 09:24:40.347725041 +0800
+++ linux-hpe4/include/linux/acpi.h	2010-11-26 09:24:53.297725001 +0800
@@ -102,6 +102,7 @@
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_lsapic(acpi_handle handle, int *pcpu);
+int acpi_map_lsapic_emu(int pcpu, int nid);
 int acpi_unmap_lsapic(int cpu);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h	2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/include/linux/cpu.h	2010-11-26 09:24:53.297725001 +0800
@@ -30,6 +30,8 @@
 	struct sys_device sysdev;
 };
 
+DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
+
 extern int register_cpu_node(struct cpu *cpu, int num, int nid);
 
 static inline int register_cpu(struct cpu *cpu, int num)
@@ -149,6 +151,7 @@
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
 int cpu_down(unsigned int cpu);
+extern int cpu_hpe_on;
 
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 extern void cpu_hotplug_driver_lock(void);
@@ -171,6 +174,7 @@
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
 #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
+static int cpu_hpe_on;
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_PM_SLEEP_SMP
Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
===================================================================
--- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:49:44.847725099 +0800
+++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:55:50.527724999 +0800
@@ -316,3 +316,9 @@
 		Do not use GB pages for kernel direct mappings.
 	gbpages
 		Use GB pages for kernel direct mappings.
+	cpu_hpe=on/off
+		Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
+		sysfs provides probe/release interface to hot add/remove CPUs dynamically.
+		We can use maxcpus=<N> to reserve CPUs.
+		This option is disabled by default.
+			

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [6/7,v8] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
  2010-12-07  1:00 ` shaohui.zheng
@ 2010-12-07  1:00   ` shaohui.zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Sam Ravnborg, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 006-hotplug-emulator-fake_socket_with_logic_cpu_on_x86.patch --]
[-- Type: text/plain, Size: 7693 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

When hotplug a CPU with emulator, we are using a logical CPU to emulate the
CPU hotplug process. For the CPU supported SMT, some logical CPUs are in the
same socket, but it may located in different NUMA node after we have emulator.
it misleads the scheduling domain to build the incorrect hierarchy, and it
causes the following call trace when rebalance the scheduling domain:

divide error: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu8/online
CPU 0 
Modules linked in: fbcon tileblit font bitblit softcursor radeon ttm drm_kms_helper e1000e usbhid via_rhine mii drm i2c_algo_bit igb dca
Pid: 0, comm: swapper Not tainted 2.6.32hpe #78 X8DTN
RIP: 0010:[<ffffffff81051da5>]  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
RSP: 0018:ffff880028203c30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000015ac0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880277e8cfa0 RDI: 0000000000000000
RBP: ffff880028203dc0 R08: ffff880277e8cfa0 R09: 0000000000000040
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f16cfc85770 CR3: 0000000001001000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81822000, task ffffffff8184a600)
Stack:
 ffff880028203d60 ffff880028203cd0 ffff8801c204ff08 ffff880028203e38
<0> 0101ffff81018c59 ffff880028203e44 00000001810806bd ffff8801c204fe00
<0> 0000000528200000 ffffffff00000000 0000000000000018 0000000000015ac0
Call Trace:
 <IRQ> 
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff81053b2c>] rebalance_domains+0x17c/0x570
 [<ffffffff81018c89>] ? read_tsc+0x9/0x20
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff810569ed>] run_rebalance_domains+0xbd/0xf0
 [<ffffffff8106471f>] __do_softirq+0xaf/0x1e0
 [<ffffffff810b7d18>] ? handle_IRQ_event+0x58/0x160
 [<ffffffff810130ac>] call_softirq+0x1c/0x30
 [<ffffffff81014a85>] do_softirq+0x65/0xa0
 [<ffffffff810645cd>] irq_exit+0x7d/0x90
 [<ffffffff81013ff0>] do_IRQ+0x70/0xe0
 [<ffffffff810128d3>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff8133387f>] ? acpi_idle_enter_bm+0x281/0x2b5
 [<ffffffff81333878>] ? acpi_idle_enter_bm+0x27a/0x2b5
 [<ffffffff8145dc8f>] ? cpuidle_idle_call+0x9f/0x130
 [<ffffffff81010e2b>] ? cpu_idle+0xab/0x100
 [<ffffffff8158aee6>] ? rest_init+0x66/0x70
 [<ffffffff81905d90>] ? start_kernel+0x3e3/0x3ef
 [<ffffffff8190533a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81905438>] ? x86_64_start_kernel+0xfa/0x109
Code: 00 00 e9 4c fb ff ff 0f 1f 80 00 00 00 00 48 8b b5 d8 fe ff ff 48 8b 45 a8 4d 29 ef 8b 56 08 48 c1 e0 0a 49 89 f0 48 89 d7 31 d2 <48> f7 f7 31 d2 48 89 45 a0 8b 76 08 4c 89 f0 48 c1 e0 0a 48 f7 
RIP  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
 RSP <ffff880028203c30>

Solution:

We put the logical CPU into a fake CPU socket, and assign it an unique
 phys_proc_id. For the fake socket, we put one logical CPU in only. This
method fixes the above bug.

CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/processor.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/processor.h	2010-11-17 09:00:51.354100239 +0800
+++ linux-hpe4/arch/x86/include/asm/processor.h	2010-11-17 09:01:10.222837594 +0800
@@ -113,6 +113,15 @@
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 #endif
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/*
+	 * Use a logic cpu to emulate a physical cpu's hotplug. We put the
+	 * logical cpu into a fake socket, assign a fake physical id to it,
+	 * and create a fake core.
+	 */
+	__u8		cpu_probe_on; /* A flag to enable cpu probe/release */
+#endif
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 #define X86_VENDOR_INTEL	0
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.222837594 +0800
@@ -97,6 +97,7 @@
  */
 static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 void cpu_hotplug_driver_lock()
 {
         mutex_lock(&x86_cpu_hotplug_driver_mutex);
@@ -106,6 +107,7 @@
 {
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
+#endif
 
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
@@ -198,6 +200,8 @@
 {
 	int cpuid, phys_id;
 	unsigned long timeout;
+	u8 cpu_probe_on = 0;
+	struct cpuinfo_x86 *c;
 
 	/*
 	 * If waken up by an INIT in an 82489DX configuration
@@ -277,7 +281,20 @@
 	/*
 	 * Save our processor parameters
 	 */
+	c = &cpu_data(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+	phys_id = c->phys_proc_id;
+#endif
+
 	smp_store_cpu_info(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	if (cpu_probe_on) {
+		c->phys_proc_id = phys_id; /* restore the fake phys_proc_id */
+		c->cpu_core_id = 0; /* force the logical cpu to core 0 */
+		c->cpu_probe_on = cpu_probe_on;
+	}
+#endif
 
 	notify_cpu_starting(cpuid);
 
@@ -400,6 +417,11 @@
 {
 	int i;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int cpu_probe_on = 0;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+#endif
 
 	cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
 
@@ -431,7 +453,8 @@
 
 	for_each_cpu(i, cpu_sibling_setup_mask) {
 		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
-		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
+		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i) &&
+			cpu_probe_on == 0) {
 			cpumask_set_cpu(i, c->llc_shared_map);
 			cpumask_set_cpu(cpu, cpu_data(i).llc_shared_map);
 		}
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 09:01:10.222837594 +0800
@@ -70,6 +70,36 @@
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+/*
+ * Put the logical cpu into a new sokect, and encapsule it into core 0.
+ */
+static void fake_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int i, phys_id = 0;
+
+	/* calculate the max phys_id */
+	for_each_present_cpu(i) {
+		struct cpuinfo_x86 *c = &cpu_data(i);
+		if (phys_id < c->phys_proc_id)
+			phys_id = c->phys_proc_id;
+	}
+
+	c->phys_proc_id = phys_id + 1; /* pick up a unused phys_proc_id */
+	c->cpu_core_id = 0; /* always put the logical cpu to core 0 */
+	c->cpu_probe_on = 1;
+}
+
+static void clear_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	c->phys_proc_id = 0;
+	c->cpu_core_id = 0;
+	c->cpu_probe_on = 0;
+}
+
+
 ssize_t arch_cpu_probe(const char *buf, size_t count)
 {
 	int nid = 0;
@@ -109,6 +139,7 @@
 	/* register cpu */
 	arch_register_cpu_node(selected, nid);
 	acpi_map_lsapic_emu(selected, nid);
+	fake_cpu_socket_info(selected);
 
 	return count;
 }
@@ -132,10 +163,13 @@
 
 	arch_unregister_cpu(cpu);
 	acpi_unmap_lsapic(cpu);
+	clear_cpu_socket_info(cpu);
+	set_cpu_present(cpu, true);
 
 	return count;
 }
 EXPORT_SYMBOL(arch_cpu_release);
+#endif CONFIG_ARCH_CPU_PROBE_RELEASE
 
 #else /* CONFIG_HOTPLUG_CPU */
 

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [6/7,v8] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
@ 2010-12-07  1:00   ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Sam Ravnborg, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 006-hotplug-emulator-fake_socket_with_logic_cpu_on_x86.patch --]
[-- Type: text/plain, Size: 7989 bytes --]

From: Shaohui Zheng <shaohui.zheng@intel.com>

When hotplug a CPU with emulator, we are using a logical CPU to emulate the
CPU hotplug process. For the CPU supported SMT, some logical CPUs are in the
same socket, but it may located in different NUMA node after we have emulator.
it misleads the scheduling domain to build the incorrect hierarchy, and it
causes the following call trace when rebalance the scheduling domain:

divide error: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu8/online
CPU 0 
Modules linked in: fbcon tileblit font bitblit softcursor radeon ttm drm_kms_helper e1000e usbhid via_rhine mii drm i2c_algo_bit igb dca
Pid: 0, comm: swapper Not tainted 2.6.32hpe #78 X8DTN
RIP: 0010:[<ffffffff81051da5>]  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
RSP: 0018:ffff880028203c30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000015ac0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880277e8cfa0 RDI: 0000000000000000
RBP: ffff880028203dc0 R08: ffff880277e8cfa0 R09: 0000000000000040
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f16cfc85770 CR3: 0000000001001000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81822000, task ffffffff8184a600)
Stack:
 ffff880028203d60 ffff880028203cd0 ffff8801c204ff08 ffff880028203e38
<0> 0101ffff81018c59 ffff880028203e44 00000001810806bd ffff8801c204fe00
<0> 0000000528200000 ffffffff00000000 0000000000000018 0000000000015ac0
Call Trace:
 <IRQ> 
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff81053b2c>] rebalance_domains+0x17c/0x570
 [<ffffffff81018c89>] ? read_tsc+0x9/0x20
 [<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
 [<ffffffff810569ed>] run_rebalance_domains+0xbd/0xf0
 [<ffffffff8106471f>] __do_softirq+0xaf/0x1e0
 [<ffffffff810b7d18>] ? handle_IRQ_event+0x58/0x160
 [<ffffffff810130ac>] call_softirq+0x1c/0x30
 [<ffffffff81014a85>] do_softirq+0x65/0xa0
 [<ffffffff810645cd>] irq_exit+0x7d/0x90
 [<ffffffff81013ff0>] do_IRQ+0x70/0xe0
 [<ffffffff810128d3>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff8133387f>] ? acpi_idle_enter_bm+0x281/0x2b5
 [<ffffffff81333878>] ? acpi_idle_enter_bm+0x27a/0x2b5
 [<ffffffff8145dc8f>] ? cpuidle_idle_call+0x9f/0x130
 [<ffffffff81010e2b>] ? cpu_idle+0xab/0x100
 [<ffffffff8158aee6>] ? rest_init+0x66/0x70
 [<ffffffff81905d90>] ? start_kernel+0x3e3/0x3ef
 [<ffffffff8190533a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81905438>] ? x86_64_start_kernel+0xfa/0x109
Code: 00 00 e9 4c fb ff ff 0f 1f 80 00 00 00 00 48 8b b5 d8 fe ff ff 48 8b 45 a8 4d 29 ef 8b 56 08 48 c1 e0 0a 49 89 f0 48 89 d7 31 d2 <48> f7 f7 31 d2 48 89 45 a0 8b 76 08 4c 89 f0 48 c1 e0 0a 48 f7 
RIP  [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
 RSP <ffff880028203c30>

Solution:

We put the logical CPU into a fake CPU socket, and assign it an unique
 phys_proc_id. For the fake socket, we put one logical CPU in only. This
method fixes the above bug.

CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/processor.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/processor.h	2010-11-17 09:00:51.354100239 +0800
+++ linux-hpe4/arch/x86/include/asm/processor.h	2010-11-17 09:01:10.222837594 +0800
@@ -113,6 +113,15 @@
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 #endif
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	/*
+	 * Use a logic cpu to emulate a physical cpu's hotplug. We put the
+	 * logical cpu into a fake socket, assign a fake physical id to it,
+	 * and create a fake core.
+	 */
+	__u8		cpu_probe_on; /* A flag to enable cpu probe/release */
+#endif
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 #define X86_VENDOR_INTEL	0
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-17 09:01:10.222837594 +0800
@@ -97,6 +97,7 @@
  */
 static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 void cpu_hotplug_driver_lock()
 {
         mutex_lock(&x86_cpu_hotplug_driver_mutex);
@@ -106,6 +107,7 @@
 {
         mutex_unlock(&x86_cpu_hotplug_driver_mutex);
 }
+#endif
 
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
@@ -198,6 +200,8 @@
 {
 	int cpuid, phys_id;
 	unsigned long timeout;
+	u8 cpu_probe_on = 0;
+	struct cpuinfo_x86 *c;
 
 	/*
 	 * If waken up by an INIT in an 82489DX configuration
@@ -277,7 +281,20 @@
 	/*
 	 * Save our processor parameters
 	 */
+	c = &cpu_data(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+	phys_id = c->phys_proc_id;
+#endif
+
 	smp_store_cpu_info(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	if (cpu_probe_on) {
+		c->phys_proc_id = phys_id; /* restore the fake phys_proc_id */
+		c->cpu_core_id = 0; /* force the logical cpu to core 0 */
+		c->cpu_probe_on = cpu_probe_on;
+	}
+#endif
 
 	notify_cpu_starting(cpuid);
 
@@ -400,6 +417,11 @@
 {
 	int i;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int cpu_probe_on = 0;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+	cpu_probe_on = c->cpu_probe_on;
+#endif
 
 	cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
 
@@ -431,7 +453,8 @@
 
 	for_each_cpu(i, cpu_sibling_setup_mask) {
 		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
-		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
+		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i) &&
+			cpu_probe_on == 0) {
 			cpumask_set_cpu(i, c->llc_shared_map);
 			cpumask_set_cpu(cpu, cpu_data(i).llc_shared_map);
 		}
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-17 09:01:10.222837594 +0800
@@ -70,6 +70,36 @@
 }
 EXPORT_SYMBOL(arch_unregister_cpu);
 
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+/*
+ * Put the logical cpu into a new sokect, and encapsule it into core 0.
+ */
+static void fake_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	int i, phys_id = 0;
+
+	/* calculate the max phys_id */
+	for_each_present_cpu(i) {
+		struct cpuinfo_x86 *c = &cpu_data(i);
+		if (phys_id < c->phys_proc_id)
+			phys_id = c->phys_proc_id;
+	}
+
+	c->phys_proc_id = phys_id + 1; /* pick up a unused phys_proc_id */
+	c->cpu_core_id = 0; /* always put the logical cpu to core 0 */
+	c->cpu_probe_on = 1;
+}
+
+static void clear_cpu_socket_info(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	c->phys_proc_id = 0;
+	c->cpu_core_id = 0;
+	c->cpu_probe_on = 0;
+}
+
+
 ssize_t arch_cpu_probe(const char *buf, size_t count)
 {
 	int nid = 0;
@@ -109,6 +139,7 @@
 	/* register cpu */
 	arch_register_cpu_node(selected, nid);
 	acpi_map_lsapic_emu(selected, nid);
+	fake_cpu_socket_info(selected);
 
 	return count;
 }
@@ -132,10 +163,13 @@
 
 	arch_unregister_cpu(cpu);
 	acpi_unmap_lsapic(cpu);
+	clear_cpu_socket_info(cpu);
+	set_cpu_present(cpu, true);
 
 	return count;
 }
 EXPORT_SYMBOL(arch_cpu_release);
+#endif CONFIG_ARCH_CPU_PROBE_RELEASE
 
 #else /* CONFIG_HOTPLUG_CPU */
 

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
  2010-12-07  1:00 ` shaohui.zheng
@ 2010-12-07  1:00   ` shaohui.zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 007-hotplug-emulator-add-memory-debugfs-interface.patch --]
[-- Type: text/plain, Size: 4673 bytes --]

From:  Shaohui Zheng <shaohui.zheng@intel.com>

Add add_memory interface to support to memory hotplug emulation for each online
node under debugfs. The reserved memory can be added into desired node with
this interface.

The layout on debugfs:
	mem_hotplug/node0/add_memory
	mem_hotplug/node1/add_memory
	mem_hotplug/node2/add_memory
	...

Add a memory section(128M) to node 3(boots with mem=1024m)

	echo 0x40000000 > mem_hotplug/node3/add_memory

And more we make it friendly, it is possible to add memory to do

	echo 1024m > mem_hotplug/node3/add_memory

CC: David Rientjes <rientjes@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-12-02 12:35:31.557622002 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-12-06 07:30:36.067622001 +0800
@@ -930,6 +930,80 @@
 
 static struct dentry *memhp_debug_root;
 
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+
+static ssize_t add_memory_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	u64 phys_addr = 0;
+	int nid = file->private_data - NULL;
+	int ret;
+
+	phys_addr = simple_strtoull(buf, NULL, 0);
+	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
+	phys_addr = memparse(buf, NULL);
+	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+
+	if (ret)
+		count = ret;
+
+	return count;
+}
+
+static int add_memory_open(struct inode *inode, struct file *file)
+{
+	file->private_data = inode->i_private;
+	return 0;
+}
+
+static const struct file_operations add_memory_file_ops = {
+	.open		= add_memory_open,
+	.write		= add_memory_store,
+	.llseek		= generic_file_llseek,
+};
+
+/*
+ * Create add_memory debugfs entry under specified node
+ */
+static int debugfs_create_add_memory_entry(int nid)
+{
+	char buf[32];
+	static struct dentry *node_debug_root;
+
+	snprintf(buf, sizeof(buf), "node%d", nid);
+	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
+
+	/* the nid information was represented by the offset of pointer(NULL+nid) */
+	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
+			NULL + nid, &add_memory_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int __init memory_debug_init(void)
+{
+	int nid;
+
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	for_each_online_node(nid)
+		 debugfs_create_add_memory_entry(nid);
+
+	return 0;
+}
+
+module_init(memory_debug_init);
+#else
+static debugfs_create_add_memory_entry(int nid)
+{
+	return 0;
+}
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
+
 static ssize_t add_node_store(struct file *file, const char __user *buf,
 				size_t count, loff_t *ppos)
 {
@@ -960,6 +1034,8 @@
 		return -ENOMEM;
 
 	ret = add_memory(nid, start, size);
+
+	debugfs_create_add_memory_entry(nid);
 	return ret ? ret : count;
 }
 
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-12-02 12:35:31.557622002 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-12-06 07:39:36.007622000 +0800
@@ -19,6 +19,7 @@
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
   4.3 Node hotplug emulation
+  4.4 Memory hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -239,6 +240,29 @@
 Once the new node has been added, it is possible to online the memory by
 toggling the "state" of its memory section(s) as described in section 5.1.
 
+4.4 Memory hotplug emulation
+------------
+With debugfs, it is possible to test memory hotplug with software method, we
+can add memory section to desired node with add_memory interface. It is a much
+more powerful interface than "probe" described in section 4.2.
+
+There is an add_memory interface for each online node at the debugfs mount
+point.
+	mem_hotplug/node0/add_memory
+	mem_hotplug/node1/add_memory
+	mem_hotplug/node2/add_memory
+	...
+
+Add a memory section(128M) to node 3(boots with mem=1024m)
+
+	echo 0x40000000 > mem_hotplug/node3/add_memory
+
+And more we make it friendly, it is possible to add memory to do
+
+	echo 1024m > mem_hotplug/node3/add_memory
+
+Once the new memory section has been added, it is possible to online the memory
+by toggling the "state" described in section 5.1.
 
 ------------------------------
 5. Logical Memory hot-add phase

-- 
Thanks & Regards,
Shaohui



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
@ 2010-12-07  1:00   ` shaohui.zheng
  0 siblings, 0 replies; 41+ messages in thread
From: shaohui.zheng @ 2010-12-07  1:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
	dave, gregkh, Haicheng Li, Shaohui Zheng

[-- Attachment #1: 007-hotplug-emulator-add-memory-debugfs-interface.patch --]
[-- Type: text/plain, Size: 4969 bytes --]

From:  Shaohui Zheng <shaohui.zheng@intel.com>

Add add_memory interface to support to memory hotplug emulation for each online
node under debugfs. The reserved memory can be added into desired node with
this interface.

The layout on debugfs:
	mem_hotplug/node0/add_memory
	mem_hotplug/node1/add_memory
	mem_hotplug/node2/add_memory
	...

Add a memory section(128M) to node 3(boots with mem=1024m)

	echo 0x40000000 > mem_hotplug/node3/add_memory

And more we make it friendly, it is possible to add memory to do

	echo 1024m > mem_hotplug/node3/add_memory

CC: David Rientjes <rientjes@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c	2010-12-02 12:35:31.557622002 +0800
+++ linux-hpe4/mm/memory_hotplug.c	2010-12-06 07:30:36.067622001 +0800
@@ -930,6 +930,80 @@
 
 static struct dentry *memhp_debug_root;
 
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+
+static ssize_t add_memory_store(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	u64 phys_addr = 0;
+	int nid = file->private_data - NULL;
+	int ret;
+
+	phys_addr = simple_strtoull(buf, NULL, 0);
+	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
+	phys_addr = memparse(buf, NULL);
+	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+
+	if (ret)
+		count = ret;
+
+	return count;
+}
+
+static int add_memory_open(struct inode *inode, struct file *file)
+{
+	file->private_data = inode->i_private;
+	return 0;
+}
+
+static const struct file_operations add_memory_file_ops = {
+	.open		= add_memory_open,
+	.write		= add_memory_store,
+	.llseek		= generic_file_llseek,
+};
+
+/*
+ * Create add_memory debugfs entry under specified node
+ */
+static int debugfs_create_add_memory_entry(int nid)
+{
+	char buf[32];
+	static struct dentry *node_debug_root;
+
+	snprintf(buf, sizeof(buf), "node%d", nid);
+	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
+
+	/* the nid information was represented by the offset of pointer(NULL+nid) */
+	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
+			NULL + nid, &add_memory_file_ops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int __init memory_debug_init(void)
+{
+	int nid;
+
+	if (!memhp_debug_root)
+		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+	if (!memhp_debug_root)
+		return -ENOMEM;
+
+	for_each_online_node(nid)
+		 debugfs_create_add_memory_entry(nid);
+
+	return 0;
+}
+
+module_init(memory_debug_init);
+#else
+static debugfs_create_add_memory_entry(int nid)
+{
+	return 0;
+}
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
+
 static ssize_t add_node_store(struct file *file, const char __user *buf,
 				size_t count, loff_t *ppos)
 {
@@ -960,6 +1034,8 @@
 		return -ENOMEM;
 
 	ret = add_memory(nid, start, size);
+
+	debugfs_create_add_memory_entry(nid);
 	return ret ? ret : count;
 }
 
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-12-02 12:35:31.557622002 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt	2010-12-06 07:39:36.007622000 +0800
@@ -19,6 +19,7 @@
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
   4.3 Node hotplug emulation
+  4.4 Memory hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -239,6 +240,29 @@
 Once the new node has been added, it is possible to online the memory by
 toggling the "state" of its memory section(s) as described in section 5.1.
 
+4.4 Memory hotplug emulation
+------------
+With debugfs, it is possible to test memory hotplug with software method, we
+can add memory section to desired node with add_memory interface. It is a much
+more powerful interface than "probe" described in section 4.2.
+
+There is an add_memory interface for each online node at the debugfs mount
+point.
+	mem_hotplug/node0/add_memory
+	mem_hotplug/node1/add_memory
+	mem_hotplug/node2/add_memory
+	...
+
+Add a memory section(128M) to node 3(boots with mem=1024m)
+
+	echo 0x40000000 > mem_hotplug/node3/add_memory
+
+And more we make it friendly, it is possible to add memory to do
+
+	echo 1024m > mem_hotplug/node3/add_memory
+
+Once the new memory section has been added, it is possible to online the memory
+by toggling the "state" described in section 5.1.
 
 ------------------------------
 5. Logical Memory hot-add phase

-- 
Thanks & Regards,
Shaohui


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-07  1:00   ` shaohui.zheng
  (?)
@ 2010-12-07 18:24   ` Eric B Munson
  2010-12-07 23:20       ` Shaohui Zheng
  -1 siblings, 1 reply; 41+ messages in thread
From: Eric B Munson @ 2010-12-07 18:24 UTC (permalink / raw)
  To: shaohui.zheng
  Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
	shaohui.zheng, rientjes, dave, gregkh, Haicheng Li

[-- Attachment #1: Type: text/plain, Size: 5093 bytes --]

Shaohui,

The documentation patch seems to be stale, it needs to be updated to match the
new file names.

On Tue, 07 Dec 2010, shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
> to explain the usage for the hotplug emulator.
> 
> Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
> Signed-off-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> ---
> Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt	2010-12-07 08:53:19.677622002 +0800
> @@ -0,0 +1,102 @@
> +NUMA Hotplug Emulator for x86_64
> +---------------------------------------------------
> +
> +NUMA hotplug emulator is able to emulate NUMA Node Hotplug
> +thru a pure software way. It intends to help people easily debug
> +and test node/CPU/memory hotplug related stuff on a
> +none-NUMA-hotplug-support machine, even a UMA machine and virtual
> +environment.
> +
> +1) Node hotplug emulation:
> +
> +Adds a numa=possible=<N> command line option to set an additional N nodes
> +as being possible for memory hotplug.  This set of possible nodes
> +control nr_node_ids and the sizes of several dynamically allocated node
> +arrays.
> +
> +This allows memory hotplug to create new nodes for newly added memory
> +rather than binding it to existing nodes.
> +
> +For emulation on x86, it would be possible to set aside memory for hotplugged
> +nodes (say, anything above 2G) and to add an additional four nodes as being
> +possible on boot with
> +
> +	mem=2G numa=possible=4
> +
> +and then creating a new 128M node at runtime:
> +
> +	# echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
> +	On node 1 totalpages: 0
> +	init_memory_mapping: 0000000080000000-0000000088000000
> +	 0080000000 - 0088000000 page 2M
> +
> +Once the new node has been added, its memory can be onlined.  If this
> +memory represents memory section 16, for example:
> +
> +	# echo online > /sys/devices/system/memory/memory16/state
> +	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
> +	Policy zone: Normal
> + [ The memory section(s) mapped to a particular node are visible via
> +   /sys/devices/system/node/node1, in this example. ]
> +
> +2) CPU hotplug emulation:
> +
> +The emulator reserves CPUs thru grub parameter, the reserved CPUs can be
> +hot-add/hot-remove in software method, it emulates the process of physical
> +cpu hotplug.
> +
> +When hotplugging a CPU with emulator, we are using a logical CPU to emulate the
> +CPU socket hotplug process. For the CPU supported SMT, some logical CPUs are in
> +the same socket, but it may located in different NUMA node after we have
> +emulator. We put the logical CPU into a fake CPU socket, and assign it a
> +unique phys_proc_id. For the fake socket, we put one logical CPU in only.
> +
> + - to hide CPUs
> +	- Using boot option "maxcpus=N" hide CPUs
> +	  N is the number of CPUs to initialize; the reset will be hidden.
> +	- Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
> +      when cpu_hpe is enabled, the rest CPUs will not be initialized
> +
> + - to hot-add CPU to node
> +	# echo nid > cpu/probe
> +
> + - to hot-remove CPU
> +	# echo nid > cpu/release
> +
> +3) Memory hotplug emulation:
> +
> +The emulator reserves memory before OS boots, the reserved memory region is
> +removed from e820 table. Each online node has an add_memory interface, and
> +memory can be hot-added via the per-ndoe add_memory debugfs interface.
> +
> +The difficulty of Memory Release is well-known, we have no plan for it until
> +now.
> +
> + - reserve memory thru a kernel boot paramter
> + 	mem=1024m
> +
> + - add a memory section to node 3
> +    # echo 0x40000000 > mem_hotplug/node3/add_memory
> +	OR
> +    # echo 1024m > mem_hotplug/node3/add_memory
> +
> +4) Script for hotplug testing
> +
> +These scripts provides convenience when we hot-add memory/cpu in batch.
> +
> +- Online all memory sections:
> +for m in /sys/devices/system/memory/memory*;
> +do
> +	echo online > $m/state;
> +done
> +
> +- CPU Online:
> +for c in /sys/devices/system/cpu/cpu*;
> +do
> +	echo 1 > $c/online;
> +done
> +
> +- David Rientjes <rientjes@google.com>
> +- Haicheng Li <haicheng.li@intel.com>
> +- Shaohui Zheng <shaohui.zheng@intel.com>
> +  Nov 2010
> 
> -- 
> Thanks & Regards,
> Shaohui
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-07 18:24   ` Eric B Munson
@ 2010-12-07 23:20       ` Shaohui Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-07 23:20 UTC (permalink / raw)
  To: Eric B Munson
  Cc: shaohui.zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Haicheng Li

On Tue, Dec 07, 2010 at 11:24:20AM -0700, Eric B Munson wrote:
> Shaohui,
> 
> The documentation patch seems to be stale, it needs to be updated to match the
> new file names.
> 
Eric,
	the major change on the patchset is on the interface, for the v8 emulator,
we accept David's per-node debugfs add_memory interface, we already included
in the documentation patch. the change is very small, so it is not obvious.

This is the change on the documentation compare with v7:
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+The difficulty of Memory Release is well-known, we have no plan for it until
+now.
+
+ - reserve memory thru a kernel boot paramter
+ 	mem=1024m
+
+ - add a memory section to node 3
+    # echo 0x40000000 > mem_hotplug/node3/add_memory
+	OR
+    # echo 1024m > mem_hotplug/node3/add_memory
+

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
@ 2010-12-07 23:20       ` Shaohui Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-07 23:20 UTC (permalink / raw)
  To: Eric B Munson
  Cc: shaohui.zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Haicheng Li

On Tue, Dec 07, 2010 at 11:24:20AM -0700, Eric B Munson wrote:
> Shaohui,
> 
> The documentation patch seems to be stale, it needs to be updated to match the
> new file names.
> 
Eric,
	the major change on the patchset is on the interface, for the v8 emulator,
we accept David's per-node debugfs add_memory interface, we already included
in the documentation patch. the change is very small, so it is not obvious.

This is the change on the documentation compare with v7:
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+The difficulty of Memory Release is well-known, we have no plan for it until
+now.
+
+ - reserve memory thru a kernel boot paramter
+ 	mem=1024m
+
+ - add a memory section to node 3
+    # echo 0x40000000 > mem_hotplug/node3/add_memory
+	OR
+    # echo 1024m > mem_hotplug/node3/add_memory
+

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-07 23:20       ` Shaohui Zheng
  (?)
@ 2010-12-08 17:46       ` Eric B Munson
  2010-12-09  0:09           ` Shaohui Zheng
  -1 siblings, 1 reply; 41+ messages in thread
From: Eric B Munson @ 2010-12-08 17:46 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: shaohui.zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Haicheng Li

[-- Attachment #1: Type: text/plain, Size: 6409 bytes --]

Shaohui,

I have had some success.  I had run into confusion on the memory hotplug with 
which files to be using to online memory.  The latest patch sorted it out for me
and I can now online disabled memory in new nodes.  I still cannot online an offlined
cpu.  Of the 12 available thread, I have 8 activated on boot with the kernel command line:

mem=8G numa=possible=12 maxcpus=8 cpu_hpe=on

I can offline a CPU just fine according to the kernel:
root@bert:/sys/devices/system/cpu# echo 7 > release
(dmesg)
[  911.494852] offline cpu 7.
[  911.694323] CPU 7 is now offline

But when I try and re-add it I get an error:
root@bert:/sys/devices/system/cpu# echo 0 > probe
(dmesg)
Dec  8 10:41:55 bert kernel: [ 1190.095051] ------------[ cut here ]------------
Dec  8 10:41:55 bert kernel: [ 1190.095056] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xce/0x180()
Dec  8 10:41:55 bert kernel: [ 1190.095057] Hardware name: System Product Name
Dec  8 10:41:55 bert kernel: [ 1190.095058] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu7'
Dec  8 10:41:55 bert kernel: [ 1190.095060] Modules linked in: nfs binfmt_misc lockd fscache nfs_acl auth_rpcgss sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek radeon snd_hda_intel snd_hda_codec snd_cmipci gameport snd_pcm ttm snd_opl3_lib drm_kms_helper snd_hwdep snd_mpu401_uart drm uvcvideo snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq xhci_hcd snd_timer videodev snd_seq_device snd psmouse i7core_edac i2c_algo_bit edac_core joydev v4l1_compat shpchp snd_page_alloc v4l2_compat_ioctl32 soundcore hwmon_vid asus_atk0110 max6650 serio_raw hid_microsoft usbhid hid firewire_ohci firewire_core crc_itu_t ahci sky2 libahci
Dec  8 10:41:55 bert kernel: [ 1190.095088] Pid: 2369, comm: bash Tainted: G        W   2.6.37-rc5-numa-test+ #3
Dec  8 10:41:55 bert kernel: [ 1190.095089] Call Trace:
Dec  8 10:41:55 bert kernel: [ 1190.095094]  [<ffffffff8105eb1f>] warn_slowpath_common+0x7f/0xc0
Dec  8 10:41:55 bert kernel: [ 1190.095096]  [<ffffffff8105ec16>] warn_slowpath_fmt+0x46/0x50
Dec  8 10:41:55 bert kernel: [ 1190.095098]  [<ffffffff811cf77e>] sysfs_add_one+0xce/0x180
Dec  8 10:41:55 bert kernel: [ 1190.095100]  [<ffffffff811cf8b1>] create_dir+0x81/0xd0
Dec  8 10:41:55 bert kernel: [ 1190.095102]  [<ffffffff811cf97d>] sysfs_create_dir+0x7d/0xd0
Dec  8 10:41:55 bert kernel: [ 1190.095106]  [<ffffffff815a2b3d>] ? sub_preempt_count+0x9d/0xd0
Dec  8 10:41:55 bert kernel: [ 1190.095109]  [<ffffffff812c9ffd>] kobject_add_internal+0xbd/0x200
Dec  8 10:41:55 bert kernel: [ 1190.095111]  [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
Dec  8 10:41:55 bert kernel: [ 1190.095113]  [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
Dec  8 10:41:55 bert kernel: [ 1190.095117]  [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
Dec  8 10:41:55 bert kernel: [ 1190.095121]  [<ffffffff81598f38>] register_cpu_node+0x32/0x88
Dec  8 10:41:55 bert kernel: [ 1190.095123]  [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
Dec  8 10:41:55 bert kernel: [ 1190.095127]  [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
Dec  8 10:41:55 bert kernel: [ 1190.095129]  [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
Dec  8 10:41:55 bert kernel: [ 1190.095131]  [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
Dec  8 10:41:55 bert kernel: [ 1190.095133]  [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
Dec  8 10:41:55 bert kernel: [ 1190.095137]  [<ffffffff811624c8>] vfs_write+0xc8/0x190
Dec  8 10:41:55 bert kernel: [ 1190.095139]  [<ffffffff81162e61>] sys_write+0x51/0x90
Dec  8 10:41:55 bert kernel: [ 1190.095142]  [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
Dec  8 10:41:55 bert kernel: [ 1190.095144] ---[ end trace f615c2a524d318ea ]---
Dec  8 10:41:55 bert kernel: [ 1190.095149] Pid: 2369, comm: bash Tainted: G        W   2.6.37-rc5-numa-test+ #3
Dec  8 10:41:55 bert kernel: [ 1190.095150] Call Trace:
Dec  8 10:41:55 bert kernel: [ 1190.095152]  [<ffffffff812ca09b>] kobject_add_internal+0x15b/0x200
Dec  8 10:41:55 bert kernel: [ 1190.095154]  [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
Dec  8 10:41:55 bert kernel: [ 1190.095156]  [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
Dec  8 10:41:55 bert kernel: [ 1190.095158]  [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
Dec  8 10:41:55 bert kernel: [ 1190.095160]  [<ffffffff81598f38>] register_cpu_node+0x32/0x88
Dec  8 10:41:55 bert kernel: [ 1190.095162]  [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
Dec  8 10:41:55 bert kernel: [ 1190.095164]  [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
Dec  8 10:41:55 bert kernel: [ 1190.095166]  [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
Dec  8 10:41:55 bert kernel: [ 1190.095168]  [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
Dec  8 10:41:55 bert kernel: [ 1190.095170]  [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
Dec  8 10:41:55 bert kernel: [ 1190.095172]  [<ffffffff811624c8>] vfs_write+0xc8/0x190
Dec  8 10:41:55 bert kernel: [ 1190.095174]  [<ffffffff81162e61>] sys_write+0x51/0x90
Dec  8 10:41:55 bert kernel: [ 1190.095176]  [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b

Am I doing something wrong?

Thanks,
Eric


On Wed, 08 Dec 2010, Shaohui Zheng wrote:

> On Tue, Dec 07, 2010 at 11:24:20AM -0700, Eric B Munson wrote:
> > Shaohui,
> > 
> > The documentation patch seems to be stale, it needs to be updated to match the
> > new file names.
> > 
> Eric,
> 	the major change on the patchset is on the interface, for the v8 emulator,
> we accept David's per-node debugfs add_memory interface, we already included
> in the documentation patch. the change is very small, so it is not obvious.
> 
> This is the change on the documentation compare with v7:
> +3) Memory hotplug emulation:
> +
> +The emulator reserves memory before OS boots, the reserved memory region is
> +removed from e820 table. Each online node has an add_memory interface, and
> +memory can be hot-added via the per-ndoe add_memory debugfs interface.
> +
> +The difficulty of Memory Release is well-known, we have no plan for it until
> +now.
> +
> + - reserve memory thru a kernel boot paramter
> + 	mem=1024m
> +
> + - add a memory section to node 3
> +    # echo 0x40000000 > mem_hotplug/node3/add_memory
> +	OR
> +    # echo 1024m > mem_hotplug/node3/add_memory
> +
> 
> -- 
> Thanks & Regards,
> Shaohui
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-07 23:20       ` Shaohui Zheng
  (?)
  (?)
@ 2010-12-08 18:16       ` Eric B Munson
  2010-12-08 21:16           ` David Rientjes
  -1 siblings, 1 reply; 41+ messages in thread
From: Eric B Munson @ 2010-12-08 18:16 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: shaohui.zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Haicheng Li

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]

Shaohui,

I was able to online a cpu to node 0 successfully.  My problem was that I did
not take the cpu offline before I released it.  Everything looks to be working
for me.

Thanks for your help,
Eric
On Wed, 08 Dec 2010, Shaohui Zheng wrote:

> On Tue, Dec 07, 2010 at 11:24:20AM -0700, Eric B Munson wrote:
> > Shaohui,
> > 
> > The documentation patch seems to be stale, it needs to be updated to match the
> > new file names.
> > 
> Eric,
> 	the major change on the patchset is on the interface, for the v8 emulator,
> we accept David's per-node debugfs add_memory interface, we already included
> in the documentation patch. the change is very small, so it is not obvious.
> 
> This is the change on the documentation compare with v7:
> +3) Memory hotplug emulation:
> +
> +The emulator reserves memory before OS boots, the reserved memory region is
> +removed from e820 table. Each online node has an add_memory interface, and
> +memory can be hot-added via the per-ndoe add_memory debugfs interface.
> +
> +The difficulty of Memory Release is well-known, we have no plan for it until
> +now.
> +
> + - reserve memory thru a kernel boot paramter
> + 	mem=1024m
> +
> + - add a memory section to node 3
> +    # echo 0x40000000 > mem_hotplug/node3/add_memory
> +	OR
> +    # echo 1024m > mem_hotplug/node3/add_memory
> +
> 
> -- 
> Thanks & Regards,
> Shaohui
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-08 18:16       ` Eric B Munson
@ 2010-12-08 21:16           ` David Rientjes
  0 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:16 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Shaohui Zheng, Andrew Morton, linux-mm, linux-kernel,
	haicheng.li, lethal, Andi Kleen, dave, Greg Kroah-Hartman,
	Haicheng Li

On Wed, 8 Dec 2010, Eric B Munson wrote:

> Shaohui,
> 
> I was able to online a cpu to node 0 successfully.  My problem was that I did
> not take the cpu offline before I released it.  Everything looks to be working
> for me.
> 

I think it should fail more gracefully than triggering WARN_ON()s because 
of duplicate sysfs dentries though, right?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
@ 2010-12-08 21:16           ` David Rientjes
  0 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:16 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Shaohui Zheng, Andrew Morton, linux-mm, linux-kernel,
	haicheng.li, lethal, Andi Kleen, dave, Greg Kroah-Hartman,
	Haicheng Li

On Wed, 8 Dec 2010, Eric B Munson wrote:

> Shaohui,
> 
> I was able to online a cpu to node 0 successfully.  My problem was that I did
> not take the cpu offline before I released it.  Everything looks to be working
> for me.
> 

I think it should fail more gracefully than triggering WARN_ON()s because 
of duplicate sysfs dentries though, right?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-07 23:20       ` Shaohui Zheng
@ 2010-12-08 21:18         ` David Rientjes
  -1 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:18 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: Eric B Munson, Andrew Morton, linux-mm, linux-kernel,
	haicheng.li, lethal, Andi Kleen, dave, Greg Kroah-Hartman,
	Haicheng Li

On Wed, 8 Dec 2010, Shaohui Zheng wrote:

> Eric,
> 	the major change on the patchset is on the interface, for the v8 emulator,
> we accept David's per-node debugfs add_memory interface, we already included
> in the documentation patch. the change is very small, so it is not obvious.
> 

It's still stale as Eric mentioned: for instance, the reference to 
/sys/kernel/debug/node/add_node which is now under mem_hotplug.  There may 
be other examples as well.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
@ 2010-12-08 21:18         ` David Rientjes
  0 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:18 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: Eric B Munson, Andrew Morton, linux-mm, linux-kernel,
	haicheng.li, lethal, Andi Kleen, dave, Greg Kroah-Hartman,
	Haicheng Li

On Wed, 8 Dec 2010, Shaohui Zheng wrote:

> Eric,
> 	the major change on the patchset is on the interface, for the v8 emulator,
> we accept David's per-node debugfs add_memory interface, we already included
> in the documentation patch. the change is very small, so it is not obvious.
> 

It's still stale as Eric mentioned: for instance, the reference to 
/sys/kernel/debug/node/add_node which is now under mem_hotplug.  There may 
be other examples as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
  2010-12-07  1:00   ` shaohui.zheng
@ 2010-12-08 21:31     ` David Rientjes
  -1 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:31 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Haicheng Li

On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:

> From:  Shaohui Zheng <shaohui.zheng@intel.com>
> 
> Add add_memory interface to support to memory hotplug emulation for each online
> node under debugfs. The reserved memory can be added into desired node with
> this interface.
> 
> The layout on debugfs:
> 	mem_hotplug/node0/add_memory
> 	mem_hotplug/node1/add_memory
> 	mem_hotplug/node2/add_memory
> 	...
> 
> Add a memory section(128M) to node 3(boots with mem=1024m)
> 
> 	echo 0x40000000 > mem_hotplug/node3/add_memory
> 
> And more we make it friendly, it is possible to add memory to do
> 
> 	echo 1024m > mem_hotplug/node3/add_memory
> 

I don't think you should be using memparse() to support this type of 
interface, the standard way of writing memory locations is by writing 
address in hex as the first example does.  The idea is to not try to make 
things simpler by introducing multiple ways of doing the same thing but 
rather to standardize on a single interface.

> CC: David Rientjes <rientjes@google.com>
> CC: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> ---
> Index: linux-hpe4/mm/memory_hotplug.c
> ===================================================================
> --- linux-hpe4.orig/mm/memory_hotplug.c	2010-12-02 12:35:31.557622002 +0800
> +++ linux-hpe4/mm/memory_hotplug.c	2010-12-06 07:30:36.067622001 +0800
> @@ -930,6 +930,80 @@
>  
>  static struct dentry *memhp_debug_root;
>  
> +#ifdef CONFIG_ARCH_MEMORY_PROBE
> +
> +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	u64 phys_addr = 0;
> +	int nid = file->private_data - NULL;
> +	int ret;
> +
> +	phys_addr = simple_strtoull(buf, NULL, 0);

This isn't doing anything.

> +	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> +	phys_addr = memparse(buf, NULL);
> +	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);

Does the add_memory() call handle memoryless nodes such that they 
appropriately transition to N_HIGH_MEMORY when memory is added?

> +
> +	if (ret)
> +		count = ret;
> +
> +	return count;
> +}
> +
> +static int add_memory_open(struct inode *inode, struct file *file)
> +{
> +	file->private_data = inode->i_private;
> +	return 0;
> +}
> +
> +static const struct file_operations add_memory_file_ops = {
> +	.open		= add_memory_open,
> +	.write		= add_memory_store,
> +	.llseek		= generic_file_llseek,
> +};
> +
> +/*
> + * Create add_memory debugfs entry under specified node
> + */
> +static int debugfs_create_add_memory_entry(int nid)
> +{
> +	char buf[32];
> +	static struct dentry *node_debug_root;
> +
> +	snprintf(buf, sizeof(buf), "node%d", nid);
> +	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);

This can fail, and if it does then the subsequent debugfs_create_file() 
will be added to root while we don't want, so this needs error handling.

> +
> +	/* the nid information was represented by the offset of pointer(NULL+nid) */
> +	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> +			NULL + nid, &add_memory_file_ops))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int __init memory_debug_init(void)
> +{
> +	int nid;
> +
> +	if (!memhp_debug_root)
> +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> +	if (!memhp_debug_root)
> +		return -ENOMEM;
> +
> +	for_each_online_node(nid)
> +		 debugfs_create_add_memory_entry(nid);
> +
> +	return 0;
> +}
> +
> +module_init(memory_debug_init);
> +#else
> +static debugfs_create_add_memory_entry(int nid)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> +
>  static ssize_t add_node_store(struct file *file, const char __user *buf,
>  				size_t count, loff_t *ppos)
>  {
> @@ -960,6 +1034,8 @@
>  		return -ENOMEM;
>  
>  	ret = add_memory(nid, start, size);
> +
> +	debugfs_create_add_memory_entry(nid);
>  	return ret ? ret : count;
>  }
>  
> Index: linux-hpe4/Documentation/memory-hotplug.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-12-02 12:35:31.557622002 +0800
> +++ linux-hpe4/Documentation/memory-hotplug.txt	2010-12-06 07:39:36.007622000 +0800
> @@ -19,6 +19,7 @@
>    4.1 Hardware(Firmware) Support
>    4.2 Notify memory hot-add event by hand
>    4.3 Node hotplug emulation
> +  4.4 Memory hotplug emulation
>  5. Logical Memory hot-add phase
>    5.1. State of memory
>    5.2. How to online memory
> @@ -239,6 +240,29 @@
>  Once the new node has been added, it is possible to online the memory by
>  toggling the "state" of its memory section(s) as described in section 5.1.
>  
> +4.4 Memory hotplug emulation
> +------------
> +With debugfs, it is possible to test memory hotplug with software method, we
> +can add memory section to desired node with add_memory interface. It is a much
> +more powerful interface than "probe" described in section 4.2.
> +
> +There is an add_memory interface for each online node at the debugfs mount
> +point.
> +	mem_hotplug/node0/add_memory
> +	mem_hotplug/node1/add_memory
> +	mem_hotplug/node2/add_memory
> +	...
> +
> +Add a memory section(128M) to node 3(boots with mem=1024m)
> +
> +	echo 0x40000000 > mem_hotplug/node3/add_memory
> +
> +And more we make it friendly, it is possible to add memory to do
> +
> +	echo 1024m > mem_hotplug/node3/add_memory
> +
> +Once the new memory section has been added, it is possible to online the memory
> +by toggling the "state" described in section 5.1.
>  
>  ------------------------------
>  5. Logical Memory hot-add phase
> 
> -- 
> Thanks & Regards,
> Shaohui
> 
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
@ 2010-12-08 21:31     ` David Rientjes
  0 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:31 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Haicheng Li

On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:

> From:  Shaohui Zheng <shaohui.zheng@intel.com>
> 
> Add add_memory interface to support to memory hotplug emulation for each online
> node under debugfs. The reserved memory can be added into desired node with
> this interface.
> 
> The layout on debugfs:
> 	mem_hotplug/node0/add_memory
> 	mem_hotplug/node1/add_memory
> 	mem_hotplug/node2/add_memory
> 	...
> 
> Add a memory section(128M) to node 3(boots with mem=1024m)
> 
> 	echo 0x40000000 > mem_hotplug/node3/add_memory
> 
> And more we make it friendly, it is possible to add memory to do
> 
> 	echo 1024m > mem_hotplug/node3/add_memory
> 

I don't think you should be using memparse() to support this type of 
interface, the standard way of writing memory locations is by writing 
address in hex as the first example does.  The idea is to not try to make 
things simpler by introducing multiple ways of doing the same thing but 
rather to standardize on a single interface.

> CC: David Rientjes <rientjes@google.com>
> CC: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> ---
> Index: linux-hpe4/mm/memory_hotplug.c
> ===================================================================
> --- linux-hpe4.orig/mm/memory_hotplug.c	2010-12-02 12:35:31.557622002 +0800
> +++ linux-hpe4/mm/memory_hotplug.c	2010-12-06 07:30:36.067622001 +0800
> @@ -930,6 +930,80 @@
>  
>  static struct dentry *memhp_debug_root;
>  
> +#ifdef CONFIG_ARCH_MEMORY_PROBE
> +
> +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	u64 phys_addr = 0;
> +	int nid = file->private_data - NULL;
> +	int ret;
> +
> +	phys_addr = simple_strtoull(buf, NULL, 0);

This isn't doing anything.

> +	printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> +	phys_addr = memparse(buf, NULL);
> +	ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);

Does the add_memory() call handle memoryless nodes such that they 
appropriately transition to N_HIGH_MEMORY when memory is added?

> +
> +	if (ret)
> +		count = ret;
> +
> +	return count;
> +}
> +
> +static int add_memory_open(struct inode *inode, struct file *file)
> +{
> +	file->private_data = inode->i_private;
> +	return 0;
> +}
> +
> +static const struct file_operations add_memory_file_ops = {
> +	.open		= add_memory_open,
> +	.write		= add_memory_store,
> +	.llseek		= generic_file_llseek,
> +};
> +
> +/*
> + * Create add_memory debugfs entry under specified node
> + */
> +static int debugfs_create_add_memory_entry(int nid)
> +{
> +	char buf[32];
> +	static struct dentry *node_debug_root;
> +
> +	snprintf(buf, sizeof(buf), "node%d", nid);
> +	node_debug_root = debugfs_create_dir(buf, memhp_debug_root);

This can fail, and if it does then the subsequent debugfs_create_file() 
will be added to root while we don't want, so this needs error handling.

> +
> +	/* the nid information was represented by the offset of pointer(NULL+nid) */
> +	if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> +			NULL + nid, &add_memory_file_ops))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int __init memory_debug_init(void)
> +{
> +	int nid;
> +
> +	if (!memhp_debug_root)
> +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> +	if (!memhp_debug_root)
> +		return -ENOMEM;
> +
> +	for_each_online_node(nid)
> +		 debugfs_create_add_memory_entry(nid);
> +
> +	return 0;
> +}
> +
> +module_init(memory_debug_init);
> +#else
> +static debugfs_create_add_memory_entry(int nid)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> +
>  static ssize_t add_node_store(struct file *file, const char __user *buf,
>  				size_t count, loff_t *ppos)
>  {
> @@ -960,6 +1034,8 @@
>  		return -ENOMEM;
>  
>  	ret = add_memory(nid, start, size);
> +
> +	debugfs_create_add_memory_entry(nid);
>  	return ret ? ret : count;
>  }
>  
> Index: linux-hpe4/Documentation/memory-hotplug.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-12-02 12:35:31.557622002 +0800
> +++ linux-hpe4/Documentation/memory-hotplug.txt	2010-12-06 07:39:36.007622000 +0800
> @@ -19,6 +19,7 @@
>    4.1 Hardware(Firmware) Support
>    4.2 Notify memory hot-add event by hand
>    4.3 Node hotplug emulation
> +  4.4 Memory hotplug emulation
>  5. Logical Memory hot-add phase
>    5.1. State of memory
>    5.2. How to online memory
> @@ -239,6 +240,29 @@
>  Once the new node has been added, it is possible to online the memory by
>  toggling the "state" of its memory section(s) as described in section 5.1.
>  
> +4.4 Memory hotplug emulation
> +------------
> +With debugfs, it is possible to test memory hotplug with software method, we
> +can add memory section to desired node with add_memory interface. It is a much
> +more powerful interface than "probe" described in section 4.2.
> +
> +There is an add_memory interface for each online node at the debugfs mount
> +point.
> +	mem_hotplug/node0/add_memory
> +	mem_hotplug/node1/add_memory
> +	mem_hotplug/node2/add_memory
> +	...
> +
> +Add a memory section(128M) to node 3(boots with mem=1024m)
> +
> +	echo 0x40000000 > mem_hotplug/node3/add_memory
> +
> +And more we make it friendly, it is possible to add memory to do
> +
> +	echo 1024m > mem_hotplug/node3/add_memory
> +
> +Once the new memory section has been added, it is possible to online the memory
> +by toggling the "state" described in section 5.1.
>  
>  ------------------------------
>  5. Logical Memory hot-add phase
> 
> -- 
> Thanks & Regards,
> Shaohui
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-07  1:00   ` shaohui.zheng
@ 2010-12-08 21:36     ` David Rientjes
  -1 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:36 UTC (permalink / raw)
  To: Shaohui Zheng, Tejun Heo
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Ingo Molnar, Len Brown,
	Yinghai Lu, Haicheng Li

On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
> was already supported in current linux kernel. NUMA Hotplug Emulator provides
> a mechanism to emulate the process with software method. It can be used for
> testing or debuging purpose.
> 
> CPU physical hotplug is different with logical CPU online/offline. Logical
> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
> hotplug emulator uses probe/release interface. It becomes possible to do cpu
> hotplug automation and stress
> 
> Add cpu interface probe/release under sysfs for x86_64. User can use this
> interface to emulate the cpu hot-add and hot-remove process.
> 
> Directive:
> *) Reserve CPU thru grub parameter like:
> 	maxcpus=4
> 
> the rest CPUs will not be initiliazed. 
> 
> *) Probe CPU
> we can use the probe interface to hot-add new CPUs:
> 	echo nid > /sys/devices/system/cpu/probe
> 
> *) Release a CPU
> 	echo cpu > /sys/devices/system/cpu/release
> 
> A reserved CPU will be hot-added to the specified node.
> 1) nid == 0, the CPU will be added to the real node which the CPU
> should be in
> 2) nid != 0, add the CPU to node nid even through it is a fake node.
> 

This patch is undoubtedly going to conflict with Tejun's unification of 
the 32 and 64 bit NUMA boot paths, specifically the patch at 
http://marc.info/?l=linux-kernel&m=129087151912379.

Tejun, what's the status of that patchset posted on November 27?  Any 
comments about this change?

> CC: Ingo Molnar <mingo@elte.hu>
> CC: Len Brown <len.brown@intel.com>
> CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> ---
> Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:40.287725018 +0800
> +++ linux-hpe4/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:53.277724996 +0800
> @@ -647,8 +647,44 @@
>  }
>  EXPORT_SYMBOL(acpi_map_lsapic);
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
> +{
> +#ifdef CONFIG_ACPI_NUMA
> +#ifdef CONFIG_X86_64
> +	apicid_to_node[physid] = nid;
> +	numa_set_node(cpu, nid);
> +#else /* CONFIG_X86_32 */
> +	apicid_2_node[physid] = nid;
> +	cpu_to_node_map[cpu] = nid;
> +#endif
> +#endif
> +}
> +
> +static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
> +int __ref acpi_map_lsapic_emu(int pcpu, int nid)
> +{
> +	/* backup cpu apicid to array cpu_to_apicid_saved */
> +	if (cpu_to_apicid_saved[pcpu] == 0 &&
> +		per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
> +		cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
> +
> +	per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
> +	acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
> +
> +	return pcpu;
> +}
> +EXPORT_SYMBOL(acpi_map_lsapic_emu);
> +#endif
> +
>  int acpi_unmap_lsapic(int cpu)
>  {
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +	/* backup cpu apicid to array cpu_to_apicid_saved */
> +	if (cpu_to_apicid_saved[cpu] == 0 &&
> +		per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
> +		cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
> +#endif
>  	per_cpu(x86_cpu_to_apicid, cpu) = -1;
>  	set_cpu_present(cpu, false);
>  	num_processors--;
> Index: linux-hpe4/arch/x86/kernel/smpboot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-26 09:24:40.297724969 +0800
> +++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-26 12:48:58.977725001 +0800
> @@ -107,8 +107,6 @@
>          mutex_unlock(&x86_cpu_hotplug_driver_mutex);
>  }
>  
> -ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
> -ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
>  #else
>  static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
>  #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
> Index: linux-hpe4/arch/x86/kernel/topology.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-26 09:24:52.477725000 +0800
> +++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-26 12:48:58.987725001 +0800
> @@ -30,6 +30,9 @@
>  #include <linux/init.h>
>  #include <linux/smp.h>
>  #include <asm/cpu.h>
> +#include <linux/cpu.h>
> +#include <linux/topology.h>
> +#include <linux/acpi.h>
>  
>  static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
>  
> @@ -66,6 +69,74 @@
>  	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
>  }
>  EXPORT_SYMBOL(arch_unregister_cpu);
> +
> +ssize_t arch_cpu_probe(const char *buf, size_t count)
> +{
> +	int nid = 0;
> +	int num = 0, selected = 0;
> +
> +	/* check parameters */
> +	if (!buf || count < 2)
> +		return -EPERM;
> +
> +	nid = simple_strtoul(buf, NULL, 0);
> +	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
> +
> +	if (nid < 0 || nid > nr_node_ids - 1) {
> +		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
> +			nid, nr_node_ids);
> +		return -EPERM;
> +	}
> +
> +	if (!node_online(nid)) {
> +		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
> +		return -EPERM;
> +	}
> +
> +	/* find first uninitialized cpu */
> +	for_each_present_cpu(num) {
> +		if (per_cpu(cpu_sys_devices, num) == NULL) {
> +			selected = num;
> +			break;
> +		}
> +	}
> +
> +	if (selected >= num_possible_cpus()) {
> +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> +		return -EPERM;
> +	}
> +
> +	/* register cpu */
> +	arch_register_cpu_node(selected, nid);
> +	acpi_map_lsapic_emu(selected, nid);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_probe);
> +
> +ssize_t arch_cpu_release(const char *buf, size_t count)
> +{
> +	int cpu = 0;
> +
> +	cpu =  simple_strtoul(buf, NULL, 0);
> +	/* cpu 0 is not hotplugable */
> +	if (cpu == 0) {
> +		printk(KERN_ERR "can not release cpu 0.\n");
> +		return -EPERM;
> +	}
> +
> +	if (cpu_online(cpu)) {
> +		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
> +		cpu_down(cpu);
> +	}
> +
> +	arch_unregister_cpu(cpu);
> +	acpi_unmap_lsapic(cpu);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_release);
> +
>  #else /* CONFIG_HOTPLUG_CPU */
>  
>  static int __init arch_register_cpu(int num)
> @@ -83,8 +154,14 @@
>  		register_one_node(i);
>  #endif
>  
> -	for_each_present_cpu(i)
> -		arch_register_cpu(i);
> +	/*
> +	 * when cpu hotplug emulation enabled, register the online cpu only,
> +	 * the rests are reserved for cpu probe.
> +	 */
> +	for_each_present_cpu(i) {
> +		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
> +			arch_register_cpu(i);
> +	}
>  
>  	return 0;
>  }
> Index: linux-hpe4/arch/x86/mm/numa_64.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/mm/numa_64.c	2010-11-26 09:24:40.317724965 +0800
> +++ linux-hpe4/arch/x86/mm/numa_64.c	2010-11-26 09:24:53.297725001 +0800
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/nodemask.h>
>  #include <linux/sched.h>
> +#include <linux/cpu.h>
>  
>  #include <asm/e820.h>
>  #include <asm/proto.h>
> @@ -785,6 +786,19 @@
>  }
>  #endif
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static __init int cpu_hpe_setup(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +
> +	if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
> +		cpu_hpe_on = 1;
> +
> +	return 0;
> +}
> +early_param("cpu_hpe", cpu_hpe_setup);
> +#endif  /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>  
>  void __cpuinit numa_set_node(int cpu, int node)
>  {
> Index: linux-hpe4/drivers/acpi/processor_driver.c
> ===================================================================
> --- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-11-26 09:24:40.327725004 +0800
> +++ linux-hpe4/drivers/acpi/processor_driver.c	2010-11-26 09:24:53.297725001 +0800
> @@ -530,6 +530,14 @@
>  		goto err_free_cpumask;
>  
>  	sysdev = get_cpu_sysdev(pr->id);
> +	/*
> +	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
> +	 * throu the cpu probe interface. Return directly.
> +	 */
> +	if (sysdev == NULL) {
> +		goto out;
> +	}
> +
>  	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
>  		result = -EFAULT;
>  		goto err_remove_fs;
> @@ -570,6 +578,7 @@
>  		goto err_remove_sysfs;
>  	}
>  
> +out:
>  	return 0;
>  
>  err_remove_sysfs:
> Index: linux-hpe4/drivers/base/cpu.c
> ===================================================================
> --- linux-hpe4.orig/drivers/base/cpu.c	2010-11-26 09:24:52.477725000 +0800
> +++ linux-hpe4/drivers/base/cpu.c	2010-11-26 09:24:53.297725001 +0800
> @@ -22,9 +22,15 @@
>  };
>  EXPORT_SYMBOL(cpu_sysdev_class);
>  
> -static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +/*
> + * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
> + * disabled in default, we can enable it throu grub parameter cpu_hpe=on
> + */
> +int cpu_hpe_on;
> +
>  static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
>  			   char *buf)
>  {
> Index: linux-hpe4/include/linux/acpi.h
> ===================================================================
> --- linux-hpe4.orig/include/linux/acpi.h	2010-11-26 09:24:40.347725041 +0800
> +++ linux-hpe4/include/linux/acpi.h	2010-11-26 09:24:53.297725001 +0800
> @@ -102,6 +102,7 @@
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  /* Arch dependent functions for cpu hotplug support */
>  int acpi_map_lsapic(acpi_handle handle, int *pcpu);
> +int acpi_map_lsapic_emu(int pcpu, int nid);
>  int acpi_unmap_lsapic(int cpu);
>  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
>  
> Index: linux-hpe4/include/linux/cpu.h
> ===================================================================
> --- linux-hpe4.orig/include/linux/cpu.h	2010-11-26 09:24:52.477725000 +0800
> +++ linux-hpe4/include/linux/cpu.h	2010-11-26 09:24:53.297725001 +0800
> @@ -30,6 +30,8 @@
>  	struct sys_device sysdev;
>  };
>  
> +DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +
>  extern int register_cpu_node(struct cpu *cpu, int num, int nid);
>  
>  static inline int register_cpu(struct cpu *cpu, int num)
> @@ -149,6 +151,7 @@
>  #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
>  #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
>  int cpu_down(unsigned int cpu);
> +extern int cpu_hpe_on;
>  
>  #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>  extern void cpu_hotplug_driver_lock(void);
> @@ -171,6 +174,7 @@
>  /* These aren't inline functions due to a GCC bug. */
>  #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
>  #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
> +static int cpu_hpe_on;
>  #endif		/* CONFIG_HOTPLUG_CPU */
>  
>  #ifdef CONFIG_PM_SLEEP_SMP
> Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:49:44.847725099 +0800
> +++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:55:50.527724999 +0800
> @@ -316,3 +316,9 @@
>  		Do not use GB pages for kernel direct mappings.
>  	gbpages
>  		Use GB pages for kernel direct mappings.
> +	cpu_hpe=on/off
> +		Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
> +		sysfs provides probe/release interface to hot add/remove CPUs dynamically.
> +		We can use maxcpus=<N> to reserve CPUs.
> +		This option is disabled by default.
> +			
> 
> -- 
> Thanks & Regards,
> Shaohui
> 
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-08 21:36     ` David Rientjes
  0 siblings, 0 replies; 41+ messages in thread
From: David Rientjes @ 2010-12-08 21:36 UTC (permalink / raw)
  To: Shaohui Zheng, Tejun Heo
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Ingo Molnar, Len Brown,
	Yinghai Lu, Haicheng Li

On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 
> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
> was already supported in current linux kernel. NUMA Hotplug Emulator provides
> a mechanism to emulate the process with software method. It can be used for
> testing or debuging purpose.
> 
> CPU physical hotplug is different with logical CPU online/offline. Logical
> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
> hotplug emulator uses probe/release interface. It becomes possible to do cpu
> hotplug automation and stress
> 
> Add cpu interface probe/release under sysfs for x86_64. User can use this
> interface to emulate the cpu hot-add and hot-remove process.
> 
> Directive:
> *) Reserve CPU thru grub parameter like:
> 	maxcpus=4
> 
> the rest CPUs will not be initiliazed. 
> 
> *) Probe CPU
> we can use the probe interface to hot-add new CPUs:
> 	echo nid > /sys/devices/system/cpu/probe
> 
> *) Release a CPU
> 	echo cpu > /sys/devices/system/cpu/release
> 
> A reserved CPU will be hot-added to the specified node.
> 1) nid == 0, the CPU will be added to the real node which the CPU
> should be in
> 2) nid != 0, add the CPU to node nid even through it is a fake node.
> 

This patch is undoubtedly going to conflict with Tejun's unification of 
the 32 and 64 bit NUMA boot paths, specifically the patch at 
http://marc.info/?l=linux-kernel&m=129087151912379.

Tejun, what's the status of that patchset posted on November 27?  Any 
comments about this change?

> CC: Ingo Molnar <mingo@elte.hu>
> CC: Len Brown <len.brown@intel.com>
> CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> ---
> Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:40.287725018 +0800
> +++ linux-hpe4/arch/x86/kernel/acpi/boot.c	2010-11-26 09:24:53.277724996 +0800
> @@ -647,8 +647,44 @@
>  }
>  EXPORT_SYMBOL(acpi_map_lsapic);
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
> +{
> +#ifdef CONFIG_ACPI_NUMA
> +#ifdef CONFIG_X86_64
> +	apicid_to_node[physid] = nid;
> +	numa_set_node(cpu, nid);
> +#else /* CONFIG_X86_32 */
> +	apicid_2_node[physid] = nid;
> +	cpu_to_node_map[cpu] = nid;
> +#endif
> +#endif
> +}
> +
> +static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
> +int __ref acpi_map_lsapic_emu(int pcpu, int nid)
> +{
> +	/* backup cpu apicid to array cpu_to_apicid_saved */
> +	if (cpu_to_apicid_saved[pcpu] == 0 &&
> +		per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
> +		cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
> +
> +	per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
> +	acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
> +
> +	return pcpu;
> +}
> +EXPORT_SYMBOL(acpi_map_lsapic_emu);
> +#endif
> +
>  int acpi_unmap_lsapic(int cpu)
>  {
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +	/* backup cpu apicid to array cpu_to_apicid_saved */
> +	if (cpu_to_apicid_saved[cpu] == 0 &&
> +		per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
> +		cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
> +#endif
>  	per_cpu(x86_cpu_to_apicid, cpu) = -1;
>  	set_cpu_present(cpu, false);
>  	num_processors--;
> Index: linux-hpe4/arch/x86/kernel/smpboot.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/smpboot.c	2010-11-26 09:24:40.297724969 +0800
> +++ linux-hpe4/arch/x86/kernel/smpboot.c	2010-11-26 12:48:58.977725001 +0800
> @@ -107,8 +107,6 @@
>          mutex_unlock(&x86_cpu_hotplug_driver_mutex);
>  }
>  
> -ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
> -ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
>  #else
>  static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
>  #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
> Index: linux-hpe4/arch/x86/kernel/topology.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/kernel/topology.c	2010-11-26 09:24:52.477725000 +0800
> +++ linux-hpe4/arch/x86/kernel/topology.c	2010-11-26 12:48:58.987725001 +0800
> @@ -30,6 +30,9 @@
>  #include <linux/init.h>
>  #include <linux/smp.h>
>  #include <asm/cpu.h>
> +#include <linux/cpu.h>
> +#include <linux/topology.h>
> +#include <linux/acpi.h>
>  
>  static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
>  
> @@ -66,6 +69,74 @@
>  	unregister_cpu(&per_cpu(cpu_devices, num).cpu);
>  }
>  EXPORT_SYMBOL(arch_unregister_cpu);
> +
> +ssize_t arch_cpu_probe(const char *buf, size_t count)
> +{
> +	int nid = 0;
> +	int num = 0, selected = 0;
> +
> +	/* check parameters */
> +	if (!buf || count < 2)
> +		return -EPERM;
> +
> +	nid = simple_strtoul(buf, NULL, 0);
> +	printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
> +
> +	if (nid < 0 || nid > nr_node_ids - 1) {
> +		printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
> +			nid, nr_node_ids);
> +		return -EPERM;
> +	}
> +
> +	if (!node_online(nid)) {
> +		printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
> +		return -EPERM;
> +	}
> +
> +	/* find first uninitialized cpu */
> +	for_each_present_cpu(num) {
> +		if (per_cpu(cpu_sys_devices, num) == NULL) {
> +			selected = num;
> +			break;
> +		}
> +	}
> +
> +	if (selected >= num_possible_cpus()) {
> +		printk(KERN_ERR "No free cpu, give up cpu probing.\n");
> +		return -EPERM;
> +	}
> +
> +	/* register cpu */
> +	arch_register_cpu_node(selected, nid);
> +	acpi_map_lsapic_emu(selected, nid);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_probe);
> +
> +ssize_t arch_cpu_release(const char *buf, size_t count)
> +{
> +	int cpu = 0;
> +
> +	cpu =  simple_strtoul(buf, NULL, 0);
> +	/* cpu 0 is not hotplugable */
> +	if (cpu == 0) {
> +		printk(KERN_ERR "can not release cpu 0.\n");
> +		return -EPERM;
> +	}
> +
> +	if (cpu_online(cpu)) {
> +		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
> +		cpu_down(cpu);
> +	}
> +
> +	arch_unregister_cpu(cpu);
> +	acpi_unmap_lsapic(cpu);
> +
> +	return count;
> +}
> +EXPORT_SYMBOL(arch_cpu_release);
> +
>  #else /* CONFIG_HOTPLUG_CPU */
>  
>  static int __init arch_register_cpu(int num)
> @@ -83,8 +154,14 @@
>  		register_one_node(i);
>  #endif
>  
> -	for_each_present_cpu(i)
> -		arch_register_cpu(i);
> +	/*
> +	 * when cpu hotplug emulation enabled, register the online cpu only,
> +	 * the rests are reserved for cpu probe.
> +	 */
> +	for_each_present_cpu(i) {
> +		if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
> +			arch_register_cpu(i);
> +	}
>  
>  	return 0;
>  }
> Index: linux-hpe4/arch/x86/mm/numa_64.c
> ===================================================================
> --- linux-hpe4.orig/arch/x86/mm/numa_64.c	2010-11-26 09:24:40.317724965 +0800
> +++ linux-hpe4/arch/x86/mm/numa_64.c	2010-11-26 09:24:53.297725001 +0800
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/nodemask.h>
>  #include <linux/sched.h>
> +#include <linux/cpu.h>
>  
>  #include <asm/e820.h>
>  #include <asm/proto.h>
> @@ -785,6 +786,19 @@
>  }
>  #endif
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static __init int cpu_hpe_setup(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +
> +	if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
> +		cpu_hpe_on = 1;
> +
> +	return 0;
> +}
> +early_param("cpu_hpe", cpu_hpe_setup);
> +#endif  /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>  
>  void __cpuinit numa_set_node(int cpu, int node)
>  {
> Index: linux-hpe4/drivers/acpi/processor_driver.c
> ===================================================================
> --- linux-hpe4.orig/drivers/acpi/processor_driver.c	2010-11-26 09:24:40.327725004 +0800
> +++ linux-hpe4/drivers/acpi/processor_driver.c	2010-11-26 09:24:53.297725001 +0800
> @@ -530,6 +530,14 @@
>  		goto err_free_cpumask;
>  
>  	sysdev = get_cpu_sysdev(pr->id);
> +	/*
> +	 * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
> +	 * throu the cpu probe interface. Return directly.
> +	 */
> +	if (sysdev == NULL) {
> +		goto out;
> +	}
> +
>  	if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
>  		result = -EFAULT;
>  		goto err_remove_fs;
> @@ -570,6 +578,7 @@
>  		goto err_remove_sysfs;
>  	}
>  
> +out:
>  	return 0;
>  
>  err_remove_sysfs:
> Index: linux-hpe4/drivers/base/cpu.c
> ===================================================================
> --- linux-hpe4.orig/drivers/base/cpu.c	2010-11-26 09:24:52.477725000 +0800
> +++ linux-hpe4/drivers/base/cpu.c	2010-11-26 09:24:53.297725001 +0800
> @@ -22,9 +22,15 @@
>  };
>  EXPORT_SYMBOL(cpu_sysdev_class);
>  
> -static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +/*
> + * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
> + * disabled in default, we can enable it throu grub parameter cpu_hpe=on
> + */
> +int cpu_hpe_on;
> +
>  static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
>  			   char *buf)
>  {
> Index: linux-hpe4/include/linux/acpi.h
> ===================================================================
> --- linux-hpe4.orig/include/linux/acpi.h	2010-11-26 09:24:40.347725041 +0800
> +++ linux-hpe4/include/linux/acpi.h	2010-11-26 09:24:53.297725001 +0800
> @@ -102,6 +102,7 @@
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  /* Arch dependent functions for cpu hotplug support */
>  int acpi_map_lsapic(acpi_handle handle, int *pcpu);
> +int acpi_map_lsapic_emu(int pcpu, int nid);
>  int acpi_unmap_lsapic(int cpu);
>  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
>  
> Index: linux-hpe4/include/linux/cpu.h
> ===================================================================
> --- linux-hpe4.orig/include/linux/cpu.h	2010-11-26 09:24:52.477725000 +0800
> +++ linux-hpe4/include/linux/cpu.h	2010-11-26 09:24:53.297725001 +0800
> @@ -30,6 +30,8 @@
>  	struct sys_device sysdev;
>  };
>  
> +DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
> +
>  extern int register_cpu_node(struct cpu *cpu, int num, int nid);
>  
>  static inline int register_cpu(struct cpu *cpu, int num)
> @@ -149,6 +151,7 @@
>  #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
>  #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
>  int cpu_down(unsigned int cpu);
> +extern int cpu_hpe_on;
>  
>  #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>  extern void cpu_hotplug_driver_lock(void);
> @@ -171,6 +174,7 @@
>  /* These aren't inline functions due to a GCC bug. */
>  #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
>  #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
> +static int cpu_hpe_on;
>  #endif		/* CONFIG_HOTPLUG_CPU */
>  
>  #ifdef CONFIG_PM_SLEEP_SMP
> Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:49:44.847725099 +0800
> +++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt	2010-11-26 12:55:50.527724999 +0800
> @@ -316,3 +316,9 @@
>  		Do not use GB pages for kernel direct mappings.
>  	gbpages
>  		Use GB pages for kernel direct mappings.
> +	cpu_hpe=on/off
> +		Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
> +		sysfs provides probe/release interface to hot add/remove CPUs dynamically.
> +		We can use maxcpus=<N> to reserve CPUs.
> +		This option is disabled by default.
> +			
> 
> -- 
> Thanks & Regards,
> Shaohui
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-08 17:46       ` Eric B Munson
@ 2010-12-09  0:09           ` Shaohui Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-09  0:09 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Shaohui Zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Haicheng Li

On Wed, Dec 08, 2010 at 10:46:33AM -0700, Eric B Munson wrote:
> Shaohui,
> 
> I have had some success.  I had run into confusion on the memory hotplug with 
> which files to be using to online memory.  The latest patch sorted it out for me
> and I can now online disabled memory in new nodes.  I still cannot online an offlined
> cpu.  Of the 12 available thread, I have 8 activated on boot with the kernel command line:
> 
> mem=8G numa=possible=12 maxcpus=8 cpu_hpe=on
> 
> I can offline a CPU just fine according to the kernel:
> root@bert:/sys/devices/system/cpu# echo 7 > release
> (dmesg)
> [  911.494852] offline cpu 7.
> [  911.694323] CPU 7 is now offline
> 
> But when I try and re-add it I get an error:
> root@bert:/sys/devices/system/cpu# echo 0 > probe
> (dmesg)
> Dec  8 10:41:55 bert kernel: [ 1190.095051] ------------[ cut here ]------------
> Dec  8 10:41:55 bert kernel: [ 1190.095056] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xce/0x180()
> Dec  8 10:41:55 bert kernel: [ 1190.095057] Hardware name: System Product Name
> Dec  8 10:41:55 bert kernel: [ 1190.095058] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu7'
> Dec  8 10:41:55 bert kernel: [ 1190.095060] Modules linked in: nfs binfmt_misc lockd fscache nfs_acl auth_rpcgss sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek radeon snd_hda_intel snd_hda_codec snd_cmipci gameport snd_pcm ttm snd_opl3_lib drm_kms_helper snd_hwdep snd_mpu401_uart drm uvcvideo snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq xhci_hcd snd_timer videodev snd_seq_device snd psmouse i7core_edac i2c_algo_bit edac_core joydev v4l1_compat shpchp snd_page_alloc v4l2_compat_ioctl32 soundcore hwmon_vid asus_atk0110 max6650 serio_raw hid_microsoft usbhid hid firewire_ohci firewire_core crc_itu_t ahci sky2 libahci
> Dec  8 10:41:55 bert kernel: [ 1190.095088] Pid: 2369, comm: bash Tainted: G        W   2.6.37-rc5-numa-test+ #3
> Dec  8 10:41:55 bert kernel: [ 1190.095089] Call Trace:
> Dec  8 10:41:55 bert kernel: [ 1190.095094]  [<ffffffff8105eb1f>] warn_slowpath_common+0x7f/0xc0
> Dec  8 10:41:55 bert kernel: [ 1190.095096]  [<ffffffff8105ec16>] warn_slowpath_fmt+0x46/0x50
> Dec  8 10:41:55 bert kernel: [ 1190.095098]  [<ffffffff811cf77e>] sysfs_add_one+0xce/0x180
> Dec  8 10:41:55 bert kernel: [ 1190.095100]  [<ffffffff811cf8b1>] create_dir+0x81/0xd0
> Dec  8 10:41:55 bert kernel: [ 1190.095102]  [<ffffffff811cf97d>] sysfs_create_dir+0x7d/0xd0
> Dec  8 10:41:55 bert kernel: [ 1190.095106]  [<ffffffff815a2b3d>] ? sub_preempt_count+0x9d/0xd0
> Dec  8 10:41:55 bert kernel: [ 1190.095109]  [<ffffffff812c9ffd>] kobject_add_internal+0xbd/0x200
> Dec  8 10:41:55 bert kernel: [ 1190.095111]  [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
> Dec  8 10:41:55 bert kernel: [ 1190.095113]  [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
> Dec  8 10:41:55 bert kernel: [ 1190.095117]  [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
> Dec  8 10:41:55 bert kernel: [ 1190.095121]  [<ffffffff81598f38>] register_cpu_node+0x32/0x88
> Dec  8 10:41:55 bert kernel: [ 1190.095123]  [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
> Dec  8 10:41:55 bert kernel: [ 1190.095127]  [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
> Dec  8 10:41:55 bert kernel: [ 1190.095129]  [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
> Dec  8 10:41:55 bert kernel: [ 1190.095131]  [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
> Dec  8 10:41:55 bert kernel: [ 1190.095133]  [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
> Dec  8 10:41:55 bert kernel: [ 1190.095137]  [<ffffffff811624c8>] vfs_write+0xc8/0x190
> Dec  8 10:41:55 bert kernel: [ 1190.095139]  [<ffffffff81162e61>] sys_write+0x51/0x90
> Dec  8 10:41:55 bert kernel: [ 1190.095142]  [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
> Dec  8 10:41:55 bert kernel: [ 1190.095144] ---[ end trace f615c2a524d318ea ]---
> Dec  8 10:41:55 bert kernel: [ 1190.095149] Pid: 2369, comm: bash Tainted: G        W   2.6.37-rc5-numa-test+ #3
> Dec  8 10:41:55 bert kernel: [ 1190.095150] Call Trace:
> Dec  8 10:41:55 bert kernel: [ 1190.095152]  [<ffffffff812ca09b>] kobject_add_internal+0x15b/0x200
> Dec  8 10:41:55 bert kernel: [ 1190.095154]  [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
> Dec  8 10:41:55 bert kernel: [ 1190.095156]  [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
> Dec  8 10:41:55 bert kernel: [ 1190.095158]  [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
> Dec  8 10:41:55 bert kernel: [ 1190.095160]  [<ffffffff81598f38>] register_cpu_node+0x32/0x88
> Dec  8 10:41:55 bert kernel: [ 1190.095162]  [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
> Dec  8 10:41:55 bert kernel: [ 1190.095164]  [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
> Dec  8 10:41:55 bert kernel: [ 1190.095166]  [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
> Dec  8 10:41:55 bert kernel: [ 1190.095168]  [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
> Dec  8 10:41:55 bert kernel: [ 1190.095170]  [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
> Dec  8 10:41:55 bert kernel: [ 1190.095172]  [<ffffffff811624c8>] vfs_write+0xc8/0x190
> Dec  8 10:41:55 bert kernel: [ 1190.095174]  [<ffffffff81162e61>] sys_write+0x51/0x90
> Dec  8 10:41:55 bert kernel: [ 1190.095176]  [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
> 
> Am I doing something wrong?
> 
> Thanks,
> Eric

Eric,
	I saw that you already get this issue solved in another email, that is good. I double check your step, and I did not find any problems.

the logic to do CPU release(arch_cpu_release),
1) offline the CPU if the CPU is online
2) unregister CPU

so even if the CPU is online, you can still release the CPU directly. I should check the return value after call cpu_down.

How about add the following checking?

--- arch/x86/kernel/topology.c-orig	2010-12-09 08:03:19.883331001 +0800
+++ arch/x86/kernel/topology.c	2010-12-09 08:01:35.993331000 +0800
@@ -158,7 +158,10 @@
 
 	if (cpu_online(cpu)) {
 		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
-		cpu_down(cpu);
+		if (!cpu_down(cpu)){
+			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
+			return -EPERM;
+		}
 	}
 
 	arch_unregister_cpu(cpu);

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
@ 2010-12-09  0:09           ` Shaohui Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-09  0:09 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Shaohui Zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
	ak, rientjes, dave, gregkh, Haicheng Li

On Wed, Dec 08, 2010 at 10:46:33AM -0700, Eric B Munson wrote:
> Shaohui,
> 
> I have had some success.  I had run into confusion on the memory hotplug with 
> which files to be using to online memory.  The latest patch sorted it out for me
> and I can now online disabled memory in new nodes.  I still cannot online an offlined
> cpu.  Of the 12 available thread, I have 8 activated on boot with the kernel command line:
> 
> mem=8G numa=possible=12 maxcpus=8 cpu_hpe=on
> 
> I can offline a CPU just fine according to the kernel:
> root@bert:/sys/devices/system/cpu# echo 7 > release
> (dmesg)
> [  911.494852] offline cpu 7.
> [  911.694323] CPU 7 is now offline
> 
> But when I try and re-add it I get an error:
> root@bert:/sys/devices/system/cpu# echo 0 > probe
> (dmesg)
> Dec  8 10:41:55 bert kernel: [ 1190.095051] ------------[ cut here ]------------
> Dec  8 10:41:55 bert kernel: [ 1190.095056] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xce/0x180()
> Dec  8 10:41:55 bert kernel: [ 1190.095057] Hardware name: System Product Name
> Dec  8 10:41:55 bert kernel: [ 1190.095058] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu7'
> Dec  8 10:41:55 bert kernel: [ 1190.095060] Modules linked in: nfs binfmt_misc lockd fscache nfs_acl auth_rpcgss sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek radeon snd_hda_intel snd_hda_codec snd_cmipci gameport snd_pcm ttm snd_opl3_lib drm_kms_helper snd_hwdep snd_mpu401_uart drm uvcvideo snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq xhci_hcd snd_timer videodev snd_seq_device snd psmouse i7core_edac i2c_algo_bit edac_core joydev v4l1_compat shpchp snd_page_alloc v4l2_compat_ioctl32 soundcore hwmon_vid asus_atk0110 max6650 serio_raw hid_microsoft usbhid hid firewire_ohci firewire_core crc_itu_t ahci sky2 libahci
> Dec  8 10:41:55 bert kernel: [ 1190.095088] Pid: 2369, comm: bash Tainted: G        W   2.6.37-rc5-numa-test+ #3
> Dec  8 10:41:55 bert kernel: [ 1190.095089] Call Trace:
> Dec  8 10:41:55 bert kernel: [ 1190.095094]  [<ffffffff8105eb1f>] warn_slowpath_common+0x7f/0xc0
> Dec  8 10:41:55 bert kernel: [ 1190.095096]  [<ffffffff8105ec16>] warn_slowpath_fmt+0x46/0x50
> Dec  8 10:41:55 bert kernel: [ 1190.095098]  [<ffffffff811cf77e>] sysfs_add_one+0xce/0x180
> Dec  8 10:41:55 bert kernel: [ 1190.095100]  [<ffffffff811cf8b1>] create_dir+0x81/0xd0
> Dec  8 10:41:55 bert kernel: [ 1190.095102]  [<ffffffff811cf97d>] sysfs_create_dir+0x7d/0xd0
> Dec  8 10:41:55 bert kernel: [ 1190.095106]  [<ffffffff815a2b3d>] ? sub_preempt_count+0x9d/0xd0
> Dec  8 10:41:55 bert kernel: [ 1190.095109]  [<ffffffff812c9ffd>] kobject_add_internal+0xbd/0x200
> Dec  8 10:41:55 bert kernel: [ 1190.095111]  [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
> Dec  8 10:41:55 bert kernel: [ 1190.095113]  [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
> Dec  8 10:41:55 bert kernel: [ 1190.095117]  [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
> Dec  8 10:41:55 bert kernel: [ 1190.095121]  [<ffffffff81598f38>] register_cpu_node+0x32/0x88
> Dec  8 10:41:55 bert kernel: [ 1190.095123]  [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
> Dec  8 10:41:55 bert kernel: [ 1190.095127]  [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
> Dec  8 10:41:55 bert kernel: [ 1190.095129]  [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
> Dec  8 10:41:55 bert kernel: [ 1190.095131]  [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
> Dec  8 10:41:55 bert kernel: [ 1190.095133]  [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
> Dec  8 10:41:55 bert kernel: [ 1190.095137]  [<ffffffff811624c8>] vfs_write+0xc8/0x190
> Dec  8 10:41:55 bert kernel: [ 1190.095139]  [<ffffffff81162e61>] sys_write+0x51/0x90
> Dec  8 10:41:55 bert kernel: [ 1190.095142]  [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
> Dec  8 10:41:55 bert kernel: [ 1190.095144] ---[ end trace f615c2a524d318ea ]---
> Dec  8 10:41:55 bert kernel: [ 1190.095149] Pid: 2369, comm: bash Tainted: G        W   2.6.37-rc5-numa-test+ #3
> Dec  8 10:41:55 bert kernel: [ 1190.095150] Call Trace:
> Dec  8 10:41:55 bert kernel: [ 1190.095152]  [<ffffffff812ca09b>] kobject_add_internal+0x15b/0x200
> Dec  8 10:41:55 bert kernel: [ 1190.095154]  [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
> Dec  8 10:41:55 bert kernel: [ 1190.095156]  [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
> Dec  8 10:41:55 bert kernel: [ 1190.095158]  [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
> Dec  8 10:41:55 bert kernel: [ 1190.095160]  [<ffffffff81598f38>] register_cpu_node+0x32/0x88
> Dec  8 10:41:55 bert kernel: [ 1190.095162]  [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
> Dec  8 10:41:55 bert kernel: [ 1190.095164]  [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
> Dec  8 10:41:55 bert kernel: [ 1190.095166]  [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
> Dec  8 10:41:55 bert kernel: [ 1190.095168]  [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
> Dec  8 10:41:55 bert kernel: [ 1190.095170]  [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
> Dec  8 10:41:55 bert kernel: [ 1190.095172]  [<ffffffff811624c8>] vfs_write+0xc8/0x190
> Dec  8 10:41:55 bert kernel: [ 1190.095174]  [<ffffffff81162e61>] sys_write+0x51/0x90
> Dec  8 10:41:55 bert kernel: [ 1190.095176]  [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
> 
> Am I doing something wrong?
> 
> Thanks,
> Eric

Eric,
	I saw that you already get this issue solved in another email, that is good. I double check your step, and I did not find any problems.

the logic to do CPU release(arch_cpu_release),
1) offline the CPU if the CPU is online
2) unregister CPU

so even if the CPU is online, you can still release the CPU directly. I should check the return value after call cpu_down.

How about add the following checking?

--- arch/x86/kernel/topology.c-orig	2010-12-09 08:03:19.883331001 +0800
+++ arch/x86/kernel/topology.c	2010-12-09 08:01:35.993331000 +0800
@@ -158,7 +158,10 @@
 
 	if (cpu_online(cpu)) {
 		printk(KERN_DEBUG "offline cpu %d.\n", cpu);
-		cpu_down(cpu);
+		if (!cpu_down(cpu)){
+			printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
+			return -EPERM;
+		}
 	}
 
 	arch_unregister_cpu(cpu);

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-08 21:16           ` David Rientjes
@ 2010-12-09  0:23             ` Shaohui Zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-09  0:23 UTC (permalink / raw)
  To: David Rientjes
  Cc: Eric B Munson, Shaohui Zheng, Andrew Morton, linux-mm,
	linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

On Wed, Dec 08, 2010 at 01:16:10PM -0800, David Rientjes wrote:
> On Wed, 8 Dec 2010, Eric B Munson wrote:
> 
> > Shaohui,
> > 
> > I was able to online a cpu to node 0 successfully.  My problem was that I did
> > not take the cpu offline before I released it.  Everything looks to be working
> > for me.
> > 
> 
> I think it should fail more gracefully than triggering WARN_ON()s because 
> of duplicate sysfs dentries though, right?

Yes, we should do more checking on the return value, the duplicate dentries can
be avoided.  

Another solution: force user to offline the cpu before we do cpu release.

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
@ 2010-12-09  0:23             ` Shaohui Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-09  0:23 UTC (permalink / raw)
  To: David Rientjes
  Cc: Eric B Munson, Shaohui Zheng, Andrew Morton, linux-mm,
	linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

On Wed, Dec 08, 2010 at 01:16:10PM -0800, David Rientjes wrote:
> On Wed, 8 Dec 2010, Eric B Munson wrote:
> 
> > Shaohui,
> > 
> > I was able to online a cpu to node 0 successfully.  My problem was that I did
> > not take the cpu offline before I released it.  Everything looks to be working
> > for me.
> > 
> 
> I think it should fail more gracefully than triggering WARN_ON()s because 
> of duplicate sysfs dentries though, right?

Yes, we should do more checking on the return value, the duplicate dentries can
be avoided.  

Another solution: force user to offline the cpu before we do cpu release.

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
  2010-12-08 21:18         ` David Rientjes
@ 2010-12-09  0:33           ` Shaohui Zheng
  -1 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-09  0:33 UTC (permalink / raw)
  To: David Rientjes
  Cc: Shaohui Zheng, Eric B Munson, Andrew Morton, linux-mm,
	linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

On Wed, Dec 08, 2010 at 01:18:02PM -0800, David Rientjes wrote:
> On Wed, 8 Dec 2010, Shaohui Zheng wrote:
> 
> > Eric,
> > 	the major change on the patchset is on the interface, for the v8 emulator,
> > we accept David's per-node debugfs add_memory interface, we already included
> > in the documentation patch. the change is very small, so it is not obvious.
> > 
> 
> It's still stale as Eric mentioned: for instance, the reference to 
> /sys/kernel/debug/node/add_node which is now under mem_hotplug.  There may 
> be other examples as well.

I forget to udpate this part, my carelessness, thanks Eric and David.

-- 
Thanks & Regards,
Shaohui


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [1/7,v8] NUMA Hotplug Emulator: documentation
@ 2010-12-09  0:33           ` Shaohui Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Shaohui Zheng @ 2010-12-09  0:33 UTC (permalink / raw)
  To: David Rientjes
  Cc: Shaohui Zheng, Eric B Munson, Andrew Morton, linux-mm,
	linux-kernel, haicheng.li, lethal, Andi Kleen, dave,
	Greg Kroah-Hartman, Haicheng Li

On Wed, Dec 08, 2010 at 01:18:02PM -0800, David Rientjes wrote:
> On Wed, 8 Dec 2010, Shaohui Zheng wrote:
> 
> > Eric,
> > 	the major change on the patchset is on the interface, for the v8 emulator,
> > we accept David's per-node debugfs add_memory interface, we already included
> > in the documentation patch. the change is very small, so it is not obvious.
> > 
> 
> It's still stale as Eric mentioned: for instance, the reference to 
> /sys/kernel/debug/node/add_node which is now under mem_hotplug.  There may 
> be other examples as well.

I forget to udpate this part, my carelessness, thanks Eric and David.

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-08 21:36     ` David Rientjes
@ 2010-12-09  9:37       ` Tejun Heo
  -1 siblings, 0 replies; 41+ messages in thread
From: Tejun Heo @ 2010-12-09  9:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Shaohui Zheng, Andrew Morton, linux-mm, linux-kernel,
	haicheng.li, lethal, Andi Kleen, dave, Greg Kroah-Hartman,
	Ingo Molnar, Len Brown, Yinghai Lu, Haicheng Li

Hello,

On 12/08/2010 10:36 PM, David Rientjes wrote:
> On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:
> 
>> From: Shaohui Zheng <shaohui.zheng@intel.com>
>>
>> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
>> was already supported in current linux kernel. NUMA Hotplug Emulator provides
>> a mechanism to emulate the process with software method. It can be used for
>> testing or debuging purpose.
>>
>> CPU physical hotplug is different with logical CPU online/offline. Logical
>> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
>> hotplug emulator uses probe/release interface. It becomes possible to do cpu
>> hotplug automation and stress
>>
>> Add cpu interface probe/release under sysfs for x86_64. User can use this
>> interface to emulate the cpu hot-add and hot-remove process.
>>
>> Directive:
>> *) Reserve CPU thru grub parameter like:
>> 	maxcpus=4
>>
>> the rest CPUs will not be initiliazed. 
>>
>> *) Probe CPU
>> we can use the probe interface to hot-add new CPUs:
>> 	echo nid > /sys/devices/system/cpu/probe
>>
>> *) Release a CPU
>> 	echo cpu > /sys/devices/system/cpu/release
>>
>> A reserved CPU will be hot-added to the specified node.
>> 1) nid == 0, the CPU will be added to the real node which the CPU
>> should be in
>> 2) nid != 0, add the CPU to node nid even through it is a fake node.
>>
> 
> This patch is undoubtedly going to conflict with Tejun's unification of 
> the 32 and 64 bit NUMA boot paths, specifically the patch at 
> http://marc.info/?l=linux-kernel&m=129087151912379.

Oh yeah, it definitely looks like it will collide with the unification
patch.  The problem is more fundamental than the actual patch
collisions tho.  During x86_32/64 merge, some parts were left unmerged
- some reflect actual differences between 32 and 64 but more were
probably because it was too much work.

These subtle diversions make the code unnecessarily complicated,
fragile and difficult to maintain, so, in general, I think we should
be heading toward unifying 32 and 64 unless the difference is caused
by actual hardware even when the feature or code might not be too
useful for 32bit.

So, the same thing holds for NUMA hotplug emulator.  32bit supports
NUMA and there already is 64bit only NUMA emulator.  I think it would
be much better if we take this chance to unify 32 and 64bit code paths
on this area rather than going further toward the wrong direction.

> Tejun, what's the status of that patchset posted on November 27?  Any 
> comments about this change?

I don't know.  I pinged Ingo yesterday.  Ingo?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-09  9:37       ` Tejun Heo
  0 siblings, 0 replies; 41+ messages in thread
From: Tejun Heo @ 2010-12-09  9:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Shaohui Zheng, Andrew Morton, linux-mm, linux-kernel,
	haicheng.li, lethal, Andi Kleen, dave, Greg Kroah-Hartman,
	Ingo Molnar, Len Brown, Yinghai Lu, Haicheng Li

Hello,

On 12/08/2010 10:36 PM, David Rientjes wrote:
> On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:
> 
>> From: Shaohui Zheng <shaohui.zheng@intel.com>
>>
>> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
>> was already supported in current linux kernel. NUMA Hotplug Emulator provides
>> a mechanism to emulate the process with software method. It can be used for
>> testing or debuging purpose.
>>
>> CPU physical hotplug is different with logical CPU online/offline. Logical
>> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
>> hotplug emulator uses probe/release interface. It becomes possible to do cpu
>> hotplug automation and stress
>>
>> Add cpu interface probe/release under sysfs for x86_64. User can use this
>> interface to emulate the cpu hot-add and hot-remove process.
>>
>> Directive:
>> *) Reserve CPU thru grub parameter like:
>> 	maxcpus=4
>>
>> the rest CPUs will not be initiliazed. 
>>
>> *) Probe CPU
>> we can use the probe interface to hot-add new CPUs:
>> 	echo nid > /sys/devices/system/cpu/probe
>>
>> *) Release a CPU
>> 	echo cpu > /sys/devices/system/cpu/release
>>
>> A reserved CPU will be hot-added to the specified node.
>> 1) nid == 0, the CPU will be added to the real node which the CPU
>> should be in
>> 2) nid != 0, add the CPU to node nid even through it is a fake node.
>>
> 
> This patch is undoubtedly going to conflict with Tejun's unification of 
> the 32 and 64 bit NUMA boot paths, specifically the patch at 
> http://marc.info/?l=linux-kernel&m=129087151912379.

Oh yeah, it definitely looks like it will collide with the unification
patch.  The problem is more fundamental than the actual patch
collisions tho.  During x86_32/64 merge, some parts were left unmerged
- some reflect actual differences between 32 and 64 but more were
probably because it was too much work.

These subtle diversions make the code unnecessarily complicated,
fragile and difficult to maintain, so, in general, I think we should
be heading toward unifying 32 and 64 unless the difference is caused
by actual hardware even when the feature or code might not be too
useful for 32bit.

So, the same thing holds for NUMA hotplug emulator.  32bit supports
NUMA and there already is 64bit only NUMA emulator.  I think it would
be much better if we take this chance to unify 32 and 64bit code paths
on this area rather than going further toward the wrong direction.

> Tejun, what's the status of that patchset posted on November 27?  Any 
> comments about this change?

I don't know.  I pinged Ingo yesterday.  Ingo?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-08 21:36     ` David Rientjes
@ 2010-12-10  1:35       ` Zheng, Shaohui
  -1 siblings, 0 replies; 41+ messages in thread
From: Zheng, Shaohui @ 2010-12-10  1:35 UTC (permalink / raw)
  To: David Rientjes, Tejun Heo
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Ingo Molnar, Brown, Len,
	Yinghai Lu, Li, Haicheng, Shaohui Zheng

Both Tejun's and my patches are under review process, the hotplug emulator patchset is much earlier than Tejun's patch. Currently, I did not know how to handle this situation.

It seems that I have 3 options:
1) continue to send this patchset based on current upstream kernel  
2) continue to send this patchset based on upstream kernel + Tejun's patch
3) Postpone the patchset until Tejun's patches are accepted.

Can someone provide some suggestions? Thanks so much.

Thanks & Regards,
Shaohui


-----Original Message-----
From: David Rientjes [mailto:rientjes@google.com] 
Sent: Thursday, December 09, 2010 5:37 AM
To: Zheng, Shaohui; Tejun Heo
Cc: Andrew Morton; linux-mm@kvack.org; linux-kernel@vger.kernel.org; haicheng.li@linux.intel.com; lethal@linux-sh.org; Andi Kleen; dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Ingo Molnar; Brown, Len; Yinghai Lu; Li, Haicheng
Subject: Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64

On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 

This patch is undoubtedly going to conflict with Tejun's unification of 
the 32 and 64 bit NUMA boot paths, specifically the patch at 
http://marc.info/?l=linux-kernel&m=129087151912379.

Tejun, what's the status of that patchset posted on November 27?  Any 
comments about this change?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-10  1:35       ` Zheng, Shaohui
  0 siblings, 0 replies; 41+ messages in thread
From: Zheng, Shaohui @ 2010-12-10  1:35 UTC (permalink / raw)
  To: David Rientjes, Tejun Heo
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Ingo Molnar, Brown, Len,
	Yinghai Lu, Li, Haicheng, Shaohui Zheng

Both Tejun's and my patches are under review process, the hotplug emulator patchset is much earlier than Tejun's patch. Currently, I did not know how to handle this situation.

It seems that I have 3 options:
1) continue to send this patchset based on current upstream kernel  
2) continue to send this patchset based on upstream kernel + Tejun's patch
3) Postpone the patchset until Tejun's patches are accepted.

Can someone provide some suggestions? Thanks so much.

Thanks & Regards,
Shaohui


-----Original Message-----
From: David Rientjes [mailto:rientjes@google.com] 
Sent: Thursday, December 09, 2010 5:37 AM
To: Zheng, Shaohui; Tejun Heo
Cc: Andrew Morton; linux-mm@kvack.org; linux-kernel@vger.kernel.org; haicheng.li@linux.intel.com; lethal@linux-sh.org; Andi Kleen; dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Ingo Molnar; Brown, Len; Yinghai Lu; Li, Haicheng
Subject: Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64

On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:

> From: Shaohui Zheng <shaohui.zheng@intel.com>
> 

This patch is undoubtedly going to conflict with Tejun's unification of 
the 32 and 64 bit NUMA boot paths, specifically the patch at 
http://marc.info/?l=linux-kernel&m=129087151912379.

Tejun, what's the status of that patchset posted on November 27?  Any 
comments about this change?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
  2010-12-09  9:37       ` Tejun Heo
@ 2010-12-10  8:01         ` Zheng, Shaohui
  -1 siblings, 0 replies; 41+ messages in thread
From: Zheng, Shaohui @ 2010-12-10  8:01 UTC (permalink / raw)
  To: Tejun Heo, David Rientjes
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Ingo Molnar, Brown, Len,
	Yinghai Lu, Li, Haicheng, Shaohui Zheng

The unification numa code of 32 and 64 bit make the codes much simpler to maintain. It is good direction.

I already rework this patch based on your unification numa code, add I add you in the CC list in my patch.

Thanks & Regards,
Shaohui


-----Original Message-----
From: Tejun Heo [mailto:tj@kernel.org] 
Sent: Thursday, December 09, 2010 5:37 PM
To: David Rientjes
Cc: Zheng, Shaohui; Andrew Morton; linux-mm@kvack.org; linux-kernel@vger.kernel.org; haicheng.li@linux.intel.com; lethal@linux-sh.org; Andi Kleen; dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Ingo Molnar; Brown, Len; Yinghai Lu; Li, Haicheng
Subject: Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64

Hello,

On 12/08/2010 10:36 PM, David Rientjes wrote:
> On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:
> 
>> From: Shaohui Zheng <shaohui.zheng@intel.com>
>>
>> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
>> was already supported in current linux kernel. NUMA Hotplug Emulator provides
>> a mechanism to emulate the process with software method. It can be used for
>> testing or debuging purpose.
>>
>> CPU physical hotplug is different with logical CPU online/offline. Logical
>> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
>> hotplug emulator uses probe/release interface. It becomes possible to do cpu
>> hotplug automation and stress
>>
>> Add cpu interface probe/release under sysfs for x86_64. User can use this
>> interface to emulate the cpu hot-add and hot-remove process.
>>
>> Directive:
>> *) Reserve CPU thru grub parameter like:
>> 	maxcpus=4
>>
>> the rest CPUs will not be initiliazed. 
>>
>> *) Probe CPU
>> we can use the probe interface to hot-add new CPUs:
>> 	echo nid > /sys/devices/system/cpu/probe
>>
>> *) Release a CPU
>> 	echo cpu > /sys/devices/system/cpu/release
>>
>> A reserved CPU will be hot-added to the specified node.
>> 1) nid == 0, the CPU will be added to the real node which the CPU
>> should be in
>> 2) nid != 0, add the CPU to node nid even through it is a fake node.
>>
> 
> This patch is undoubtedly going to conflict with Tejun's unification of 
> the 32 and 64 bit NUMA boot paths, specifically the patch at 
> http://marc.info/?l=linux-kernel&m=129087151912379.

Oh yeah, it definitely looks like it will collide with the unification
patch.  The problem is more fundamental than the actual patch
collisions tho.  During x86_32/64 merge, some parts were left unmerged
- some reflect actual differences between 32 and 64 but more were
probably because it was too much work.

These subtle diversions make the code unnecessarily complicated,
fragile and difficult to maintain, so, in general, I think we should
be heading toward unifying 32 and 64 unless the difference is caused
by actual hardware even when the feature or code might not be too
useful for 32bit.

So, the same thing holds for NUMA hotplug emulator.  32bit supports
NUMA and there already is 64bit only NUMA emulator.  I think it would
be much better if we take this chance to unify 32 and 64bit code paths
on this area rather than going further toward the wrong direction.

> Tejun, what's the status of that patchset posted on November 27?  Any 
> comments about this change?

I don't know.  I pinged Ingo yesterday.  Ingo?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64
@ 2010-12-10  8:01         ` Zheng, Shaohui
  0 siblings, 0 replies; 41+ messages in thread
From: Zheng, Shaohui @ 2010-12-10  8:01 UTC (permalink / raw)
  To: Tejun Heo, David Rientjes
  Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
	Andi Kleen, dave, Greg Kroah-Hartman, Ingo Molnar, Brown, Len,
	Yinghai Lu, Li, Haicheng, Shaohui Zheng

The unification numa code of 32 and 64 bit make the codes much simpler to maintain. It is good direction.

I already rework this patch based on your unification numa code, add I add you in the CC list in my patch.

Thanks & Regards,
Shaohui


-----Original Message-----
From: Tejun Heo [mailto:tj@kernel.org] 
Sent: Thursday, December 09, 2010 5:37 PM
To: David Rientjes
Cc: Zheng, Shaohui; Andrew Morton; linux-mm@kvack.org; linux-kernel@vger.kernel.org; haicheng.li@linux.intel.com; lethal@linux-sh.org; Andi Kleen; dave@linux.vnet.ibm.com; Greg Kroah-Hartman; Ingo Molnar; Brown, Len; Yinghai Lu; Li, Haicheng
Subject: Re: [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64

Hello,

On 12/08/2010 10:36 PM, David Rientjes wrote:
> On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:
> 
>> From: Shaohui Zheng <shaohui.zheng@intel.com>
>>
>> CPU physical hot-add/hot-remove are supported on some hardwares, and it 
>> was already supported in current linux kernel. NUMA Hotplug Emulator provides
>> a mechanism to emulate the process with software method. It can be used for
>> testing or debuging purpose.
>>
>> CPU physical hotplug is different with logical CPU online/offline. Logical
>> online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
>> hotplug emulator uses probe/release interface. It becomes possible to do cpu
>> hotplug automation and stress
>>
>> Add cpu interface probe/release under sysfs for x86_64. User can use this
>> interface to emulate the cpu hot-add and hot-remove process.
>>
>> Directive:
>> *) Reserve CPU thru grub parameter like:
>> 	maxcpus=4
>>
>> the rest CPUs will not be initiliazed. 
>>
>> *) Probe CPU
>> we can use the probe interface to hot-add new CPUs:
>> 	echo nid > /sys/devices/system/cpu/probe
>>
>> *) Release a CPU
>> 	echo cpu > /sys/devices/system/cpu/release
>>
>> A reserved CPU will be hot-added to the specified node.
>> 1) nid == 0, the CPU will be added to the real node which the CPU
>> should be in
>> 2) nid != 0, add the CPU to node nid even through it is a fake node.
>>
> 
> This patch is undoubtedly going to conflict with Tejun's unification of 
> the 32 and 64 bit NUMA boot paths, specifically the patch at 
> http://marc.info/?l=linux-kernel&m=129087151912379.

Oh yeah, it definitely looks like it will collide with the unification
patch.  The problem is more fundamental than the actual patch
collisions tho.  During x86_32/64 merge, some parts were left unmerged
- some reflect actual differences between 32 and 64 but more were
probably because it was too much work.

These subtle diversions make the code unnecessarily complicated,
fragile and difficult to maintain, so, in general, I think we should
be heading toward unifying 32 and 64 unless the difference is caused
by actual hardware even when the feature or code might not be too
useful for 32bit.

So, the same thing holds for NUMA hotplug emulator.  32bit supports
NUMA and there already is 64bit only NUMA emulator.  I think it would
be much better if we take this chance to unify 32 and 64bit code paths
on this area rather than going further toward the wrong direction.

> Tejun, what's the status of that patchset posted on November 27?  Any 
> comments about this change?

I don't know.  I pinged Ingo yesterday.  Ingo?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-12-10  8:01 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-07  1:00 [0/7,v8] NUMA Hotplug Emulator (v8) shaohui.zheng
2010-12-07  1:00 ` shaohui.zheng
2010-12-07  1:00 ` [1/7,v8] NUMA Hotplug Emulator: documentation shaohui.zheng
2010-12-07  1:00   ` shaohui.zheng
2010-12-07 18:24   ` Eric B Munson
2010-12-07 23:20     ` Shaohui Zheng
2010-12-07 23:20       ` Shaohui Zheng
2010-12-08 17:46       ` Eric B Munson
2010-12-09  0:09         ` Shaohui Zheng
2010-12-09  0:09           ` Shaohui Zheng
2010-12-08 18:16       ` Eric B Munson
2010-12-08 21:16         ` David Rientjes
2010-12-08 21:16           ` David Rientjes
2010-12-09  0:23           ` Shaohui Zheng
2010-12-09  0:23             ` Shaohui Zheng
2010-12-08 21:18       ` David Rientjes
2010-12-08 21:18         ` David Rientjes
2010-12-09  0:33         ` Shaohui Zheng
2010-12-09  0:33           ` Shaohui Zheng
2010-12-07  1:00 ` [2/7,v8] NUMA Hotplug Emulator: Add numa=possible option shaohui.zheng
2010-12-07  1:00   ` shaohui.zheng
2010-12-07  1:00 ` [3/7,v8] NUMA Hotplug Emulator: Add node hotplug emulation shaohui.zheng
2010-12-07  1:00   ` shaohui.zheng
2010-12-07  1:00 ` [4/7,v8] NUMA Hotplug Emulator: Abstract cpu register functions shaohui.zheng
2010-12-07  1:00   ` shaohui.zheng
2010-12-07  1:00 ` [5/7,v8] NUMA Hotplug Emulator: Support cpu probe/release in x86_64 shaohui.zheng
2010-12-07  1:00   ` shaohui.zheng
2010-12-08 21:36   ` David Rientjes
2010-12-08 21:36     ` David Rientjes
2010-12-09  9:37     ` Tejun Heo
2010-12-09  9:37       ` Tejun Heo
2010-12-10  8:01       ` Zheng, Shaohui
2010-12-10  8:01         ` Zheng, Shaohui
2010-12-10  1:35     ` Zheng, Shaohui
2010-12-10  1:35       ` Zheng, Shaohui
2010-12-07  1:00 ` [6/7,v8] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86 shaohui.zheng
2010-12-07  1:00   ` shaohui.zheng
2010-12-07  1:00 ` [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface shaohui.zheng
2010-12-07  1:00   ` shaohui.zheng
2010-12-08 21:31   ` David Rientjes
2010-12-08 21:31     ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.