linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/5] x86_64 CPU hotplug patch series.
@ 2005-06-02 12:57 Ashok Raj
  2005-06-02 12:57 ` [patch 1/5] x86_64: Change init sections for CPU hotplug support Ashok Raj
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 12:57 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton
  Cc: linux-kernel, discuss, Rusty Russell, Srivattsa Vaddagiri,
	Zwane Mwaikambo

Hi,

Attached are the set of patches to support x86-64 logical cpu hot-add/remove.

Special thanks to Andi for painfully reviewing some interim versions.

It seems to hold tight for overnight stress tests with make -j's and 
hotplug happening in parallel. 

Andrew: Could you help test staging in -mm so we can get some wider testing
from those interested.

*Sore Point*: Andi doesnt agree with one patch that removes ipi-broadcast 
and uses only online map cpus receive IPI's. This is much simpler approach to 
handle instead of trying to remove the ill effects of IPI broadcast to CPUs in 
offline state.

Initial concern from Andi was IPI performance, but some primitive test with a 
good number of samples doesnt seem to indicate any degration at all, infact the
results seem identical. (Barring any operator errors :-( ).

It would be nice to hear other opinions as well, hopefuly we can close on
what what the right approach in this case. Link to an earlier discussion
on the topic.

http://marc.theaimsgroup.com/?l=linux-kernel&m=111695485610434&w=2

Rusty (who has been hiding somewhere in the woods these days :-)), could 
you suggest something...

Cheers,
Ashok Raj


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 1/5] x86_64: Change init sections for CPU hotplug support
  2005-06-02 12:57 [patch 0/5] x86_64 CPU hotplug patch series Ashok Raj
@ 2005-06-02 12:57 ` Ashok Raj
  2005-06-02 20:14   ` Zwane Mwaikambo
  2005-06-02 12:57 ` [patch 2/5] x86_64: " Ashok Raj
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 12:57 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton
  Cc: linux-kernel, discuss, Rusty Russell, Srivattsa Vaddagiri,
	Zwane Mwaikambo, Ashok Raj

[-- Attachment #1: x86_64-cpuhp-initcall-cleanup.patch --]
[-- Type: text/plain, Size: 13524 bytes --]

This patch adds __cpuinit and __cpuinitdata sections that need to exist
past boot to support cpu hotplug.

Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
Andi Kleen. Much of the generic hotplug code in kernel, and none of the 
other archs that support CPU hotplug today, i386, ia64, ppc64, s390 and
parisc dont mark sections with __cpuinit, but only mark them as __devinit, 
and __devinitdata.

If someone is motivated to change generic code, we need to make sure all
existing hotplug code does not break, on other arch's that dont use 
__cpuinit, and __cpudevinit.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
-----------------------------------------
 arch/x86_64/kernel/apic.c      |    8 ++++----
 arch/x86_64/kernel/i387.c      |    2 +-
 arch/x86_64/kernel/mce.c       |    8 ++++----
 arch/x86_64/kernel/mce_intel.c |    4 ++--
 arch/x86_64/kernel/nmi.c       |    4 ++--
 arch/x86_64/kernel/process.c   |    2 +-
 arch/x86_64/kernel/setup.c     |   18 +++++++++---------
 arch/x86_64/kernel/setup64.c   |    6 +++---
 arch/x86_64/kernel/smpboot.c   |   15 +++++----------
 arch/x86_64/mm/numa.c          |    2 +-
 include/linux/init.h           |   13 +++++++++++++
 mm/page_alloc.c                |    2 +-
 12 files changed, 46 insertions(+), 38 deletions(-)

Index: linux-2.6.12-rc5-mm2/include/linux/init.h
===================================================================
--- linux-2.6.12-rc5-mm2.orig/include/linux/init.h
+++ linux-2.6.12-rc5-mm2/include/linux/init.h
@@ -222,6 +222,19 @@ void __init parse_early_param(void);
 #define __devinitdata
 #define __devexit
 #define __devexitdata
+
+#ifdef CONFIG_HOTPLUG_CPU
+#define __cpuinit
+#define __cpuinitdata
+#define __cpuexit
+#define __cpuexitdata
+#else
+#define __cpuinit	__init
+#define __cpuinitdata __initdata
+#define __cpuexit __exit
+#define __cpuexitdata	__exitdata
+#endif
+
 #else
 #define __devinit __init
 #define __devinitdata __initdata
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smpboot.c
@@ -58,11 +58,6 @@
 #include <asm/proto.h>
 #include <asm/nmi.h>
 
-/* Change for real CPU hotplug. Note other files need to be fixed
-   first too. */
-#define __cpuinit __init
-#define __cpuinitdata __initdata
-
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
 /* Package ID of each logical CPU */
@@ -823,7 +818,7 @@ static __cpuinit void smp_cleanup_boot(v
  *
  * RED-PEN audit/test this more. I bet there is more state messed up here.
  */
-static __cpuinit void disable_smp(void)
+static __init void disable_smp(void)
 {
 	cpu_present_map = cpumask_of_cpu(0);
 	cpu_possible_map = cpumask_of_cpu(0);
@@ -838,7 +833,7 @@ static __cpuinit void disable_smp(void)
 /*
  * Handle user cpus=... parameter.
  */
-static __cpuinit void enforce_max_cpus(unsigned max_cpus)
+static __init void enforce_max_cpus(unsigned max_cpus)
 {
 	int i, k;
 	k = 0;
@@ -855,7 +850,7 @@ static __cpuinit void enforce_max_cpus(u
 /*
  * Various sanity checks.
  */
-static int __cpuinit smp_sanity_check(unsigned max_cpus)
+static int __init smp_sanity_check(unsigned max_cpus)
 {
 	if (!physid_isset(hard_smp_processor_id(), phys_cpu_present_map)) {
 		printk("weird, boot CPU (#%d) not listed by the BIOS.\n",
@@ -913,7 +908,7 @@ static int __cpuinit smp_sanity_check(un
  * Prepare for SMP bootup.  The MP table or ACPI has been read
  * earlier.  Just do some sanity checking here and enable APIC mode.
  */
-void __cpuinit smp_prepare_cpus(unsigned int max_cpus)
+void __init smp_prepare_cpus(unsigned int max_cpus)
 {
 	int i;
 
@@ -1019,7 +1014,7 @@ int __cpuinit __cpu_up(unsigned int cpu)
 /*
  * Finish the SMP boot.
  */
-void __cpuinit smp_cpus_done(unsigned int max_cpus)
+void __init smp_cpus_done(unsigned int max_cpus)
 {
 	zap_low_mappings();
 	smp_cleanup_boot();
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/process.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/process.c
@@ -204,7 +204,7 @@ static void mwait_idle(void)
 	}
 }
 
-void __init select_idle_routine(const struct cpuinfo_x86 *c)
+void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
 	static int printed;
 	if (cpu_has(c, X86_FEATURE_MWAIT)) {
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/setup.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/setup.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/setup.c
@@ -708,7 +708,7 @@ void __init setup_arch(char **cmdline_p)
 #endif
 }
 
-static int __init get_model_name(struct cpuinfo_x86 *c)
+static int __cpuinit get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
 
@@ -724,7 +724,7 @@ static int __init get_model_name(struct 
 }
 
 
-static void __init display_cacheinfo(struct cpuinfo_x86 *c)
+static void __cpuinit display_cacheinfo(struct cpuinfo_x86 *c)
 {
 	unsigned int n, dummy, eax, ebx, ecx, edx;
 
@@ -835,7 +835,7 @@ static int __init init_amd(struct cpuinf
 	return r;
 }
 
-static void __init detect_ht(struct cpuinfo_x86 *c)
+static void __cpuinit detect_ht(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
 	u32 	eax, ebx, ecx, edx;
@@ -896,7 +896,7 @@ static void __init detect_ht(struct cpui
 /*
  * find out the number of processor cores on the die
  */
-static int __init intel_num_cpu_cores(struct cpuinfo_x86 *c)
+static int __cpuinit intel_num_cpu_cores(struct cpuinfo_x86 *c)
 {
 	unsigned int eax;
 
@@ -914,7 +914,7 @@ static int __init intel_num_cpu_cores(st
 		return 1;
 }
 
-static void __init init_intel(struct cpuinfo_x86 *c)
+static void __cpuinit init_intel(struct cpuinfo_x86 *c)
 {
 	/* Cache sizes */
 	unsigned n;
@@ -934,7 +934,7 @@ static void __init init_intel(struct cpu
  	c->x86_num_cores = intel_num_cpu_cores(c);
 }
 
-void __init get_cpu_vendor(struct cpuinfo_x86 *c)
+void __cpuinit get_cpu_vendor(struct cpuinfo_x86 *c)
 {
 	char *v = c->x86_vendor_id;
 
@@ -955,7 +955,7 @@ struct cpu_model_info {
 /* Do some early cpuid on the boot CPU to get some parameter that are
    needed before check_bugs. Everything advanced is in identify_cpu
    below. */
-void __init early_identify_cpu(struct cpuinfo_x86 *c)
+void __cpuinit early_identify_cpu(struct cpuinfo_x86 *c)
 {
 	u32 tfms;
 
@@ -1009,7 +1009,7 @@ void __init early_identify_cpu(struct cp
 /*
  * This does the hard work of actually picking apart the CPU stuff...
  */
-void __init identify_cpu(struct cpuinfo_x86 *c)
+void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 {
 	int i;
 	u32 xlvl;
@@ -1086,7 +1086,7 @@ void __init identify_cpu(struct cpuinfo_
 }
  
 
-void __init print_cpu_info(struct cpuinfo_x86 *c)
+void __cpuinit print_cpu_info(struct cpuinfo_x86 *c)
 {
 	if (c->x86_model_id[0])
 		printk("%s", c->x86_model_id);
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/apic.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/apic.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/apic.c
@@ -321,7 +321,7 @@ void __init init_bsp_APIC(void)
 	apic_write_around(APIC_LVT1, value);
 }
 
-void __init setup_local_APIC (void)
+void __cpuinit setup_local_APIC (void)
 {
 	unsigned int value, ver, maxlvt;
 
@@ -570,7 +570,7 @@ static struct sys_device device_lapic = 
 	.cls		= &lapic_sysclass,
 };
 
-static void __init apic_pm_activate(void)
+static void __cpuinit apic_pm_activate(void)
 {
 	apic_pm_state.active = 1;
 }
@@ -810,14 +810,14 @@ void __init setup_boot_APIC_clock (void)
 	local_irq_enable();
 }
 
-void __init setup_secondary_APIC_clock(void)
+void __cpuinit setup_secondary_APIC_clock(void)
 {
 	local_irq_disable(); /* FIXME: Do we need this? --RR */
 	setup_APIC_timer(calibration_result);
 	local_irq_enable();
 }
 
-void __init disable_APIC_timer(void)
+void __cpuinit disable_APIC_timer(void)
 {
 	if (using_apic_timer) {
 		unsigned long v;
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/setup64.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/setup64.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/setup64.c
@@ -29,7 +29,7 @@
 
 char x86_boot_params[BOOT_PARAM_SIZE] __initdata = {0,};
 
-cpumask_t cpu_initialized __initdata = CPU_MASK_NONE;
+cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
 struct x8664_pda cpu_pda[NR_CPUS] __cacheline_aligned; 
 
@@ -171,7 +171,7 @@ void syscall_init(void)
 	wrmsrl(MSR_SYSCALL_MASK, EF_TF|EF_DF|EF_IE|0x3000); 
 }
 
-void __init check_efer(void)
+void __cpuinit check_efer(void)
 {
 	unsigned long efer;
 
@@ -188,7 +188,7 @@ void __init check_efer(void)
  * 'CPU state barrier', nothing should get across.
  * A lot of state is already set up in PDA init.
  */
-void __init cpu_init (void)
+void __cpuinit cpu_init (void)
 {
 #ifdef CONFIG_SMP
 	int cpu = stack_smp_processor_id();
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/i387.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/i387.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/i387.c
@@ -42,7 +42,7 @@ void mxcsr_feature_mask_init(void)
  * Called at bootup to set up the initial FPU state that is later cloned
  * into all processes.
  */
-void __init fpu_init(void)
+void __cpuinit fpu_init(void)
 {
 	unsigned long oldcr0 = read_cr0();
 	extern void __bad_fxsave_alignment(void);
Index: linux-2.6.12-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/mm/page_alloc.c
+++ linux-2.6.12-rc5-mm2/mm/page_alloc.c
@@ -72,7 +72,7 @@ EXPORT_SYMBOL(zone_table);
 
 #ifdef CONFIG_NUMA
 static struct per_cpu_pageset
-	pageset_table[MAX_NR_ZONES*MAX_NUMNODES*NR_CPUS] __initdata;
+	pageset_table[MAX_NR_ZONES*MAX_NUMNODES*NR_CPUS] __cpuinitdata;
 #endif
 
 static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/nmi.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/nmi.c
@@ -98,7 +98,7 @@ static unsigned int nmi_p4_cccr_val;
 	(P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT|	\
 	 P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
 
-static __init inline int nmi_known_cpu(void)
+static __cpuinit inline int nmi_known_cpu(void)
 {
 	switch (boot_cpu_data.x86_vendor) {
 	case X86_VENDOR_AMD:
@@ -110,7 +110,7 @@ static __init inline int nmi_known_cpu(v
 }
 
 /* Run after command line and cpu_init init, but before all other checks */
-void __init nmi_watchdog_default(void)
+void __cpuinit nmi_watchdog_default(void)
 {
 	if (nmi_watchdog != NMI_DEFAULT)
 		return;
Index: linux-2.6.12-rc5-mm2/arch/x86_64/mm/numa.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/mm/numa.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/mm/numa.c
@@ -251,7 +251,7 @@ void __init numa_initmem_init(unsigned l
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
 }
 
-__init void numa_add_cpu(int cpu)
+__cpuinit void numa_add_cpu(int cpu)
 {
 	/* BP is initialized elsewhere */
 	if (cpu) 
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/mce.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/mce.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/mce.c
@@ -327,7 +327,7 @@ static void mce_init(void *dummy)
 }
 
 /* Add per CPU specific workarounds here */
-static void __init mce_cpu_quirks(struct cpuinfo_x86 *c) 
+static void __cpuinit mce_cpu_quirks(struct cpuinfo_x86 *c)
 { 
 	/* This should be disabled by the BIOS, but isn't always */
 	if (c->x86_vendor == X86_VENDOR_AMD && c->x86 == 15) {
@@ -337,7 +337,7 @@ static void __init mce_cpu_quirks(struct
 	}
 }			
 
-static void __init mce_cpu_features(struct cpuinfo_x86 *c)
+static void __cpuinit mce_cpu_features(struct cpuinfo_x86 *c)
 {
 	switch (c->x86_vendor) {
 	case X86_VENDOR_INTEL:
@@ -352,7 +352,7 @@ static void __init mce_cpu_features(stru
  * Called for each booted CPU to set up machine checks.
  * Must be called with preempt off. 
  */
-void __init mcheck_init(struct cpuinfo_x86 *c)
+void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
 {
 	static cpumask_t mce_cpus __initdata = CPU_MASK_NONE;
 
@@ -542,7 +542,7 @@ ACCESSOR(bank4ctl,bank[4],mce_restart())
 ACCESSOR(tolerant,tolerant,)
 ACCESSOR(check_interval,check_interval,mce_restart())
 
-static __init int mce_init_device(void)
+static __cpuinit int mce_init_device(void)
 {
 	int err;
 	if (!mce_available(&boot_cpu_data))
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/mce_intel.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/mce_intel.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/mce_intel.c
@@ -42,7 +42,7 @@ done:
 	irq_exit();
 }
 
-static void __init intel_init_thermal(struct cpuinfo_x86 *c)
+static void __cpuinit intel_init_thermal(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
 	int tm2 = 0;
@@ -93,7 +93,7 @@ static void __init intel_init_thermal(st
 	return;
 }
 
-void __init mce_intel_feature_init(struct cpuinfo_x86 *c)
+void __cpuinit mce_intel_feature_init(struct cpuinfo_x86 *c)
 {
 	intel_init_thermal(c);
 }

--


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 2/5] x86_64: CPU hotplug support.
  2005-06-02 12:57 [patch 0/5] x86_64 CPU hotplug patch series Ashok Raj
  2005-06-02 12:57 ` [patch 1/5] x86_64: Change init sections for CPU hotplug support Ashok Raj
@ 2005-06-02 12:57 ` Ashok Raj
  2005-06-02 20:19   ` Zwane Mwaikambo
  2005-06-02 12:57 ` [patch 3/5] x86_64: CPU hotplug sibling map cleanup Ashok Raj
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 12:57 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton
  Cc: linux-kernel, discuss, Rusty Russell, Srivattsa Vaddagiri,
	Zwane Mwaikambo

[-- Attachment #1: x86_64-cpuhotplug.patch --]
[-- Type: text/plain, Size: 15258 bytes --]

Experimental CPU hotplug patch for x86_64
-----------------------------------------
This supports logical CPU online and offline.
- Test with maxcpus=1, and then kick other cpu's off to test if init code
  is all cleaned up. CONFIG_SCHED_SMT works as well.
- idle threads are forked on demand from keventd threads for clean startup

TBD: 
1. Not tested on a real NUMA machine (tested with numa=fake=2)
2. Handle ACPI pieces for physical hotplug support.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
-----------------------------------------

 arch/i386/mach-default/topology.c |   15 +-
 arch/x86_64/Kconfig               |    9 +
 arch/x86_64/kernel/irq.c          |   30 +++++
 arch/x86_64/kernel/process.c      |   29 +++++
 arch/x86_64/kernel/smp.c          |   12 ++
 arch/x86_64/kernel/smpboot.c      |  201 ++++++++++++++++++++++++++++++++++----
 arch/x86_64/kernel/traps.c        |    8 +
 include/asm-x86_64/irq.h          |    5 
 include/asm-x86_64/smp.h          |    4 
 9 files changed, 286 insertions(+), 27 deletions(-)

Index: linux-2.6.12-rc5-mm2/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/Kconfig
+++ linux-2.6.12-rc5-mm2/arch/x86_64/Kconfig
@@ -288,6 +288,15 @@ config NR_CPUS
 	  This is purely to save memory - each supported CPU requires
 	  memory in the static kernel configuration.
 
+config HOTPLUG_CPU
+	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
+	depends on SMP && HOTPLUG && EXPERIMENTAL
+	help
+		Say Y here to experiment with turning CPUs off and on.  CPUs
+		can be controlled through /sys/devices/system/cpu/cpu#.
+		Say N if you want to disable CPU hotplug.
+
+
 config HPET_TIMER
 	bool
 	default y
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smpboot.c
@@ -34,6 +34,7 @@
  *      Andi Kleen              :       Converted to new state machine.
  *					Various cleanups.
  *					Probably mostly hotplug CPU ready now.
+ *	Ashok Raj			: CPU hotplug support
  */
 
 
@@ -98,6 +99,37 @@ EXPORT_SYMBOL(cpu_core_map);
 extern unsigned char trampoline_data[];
 extern unsigned char trampoline_end[];
 
+/* State of each CPU */
+DEFINE_PER_CPU(int, cpu_state) = { 0 };
+
+/*
+ * Store all idle threads, this can be reused instead of creating
+ * a new thread. Also avoids complicated thread destroy functionality
+ * for idle threads.
+ */
+struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
+
+#define get_idle_for_cpu(x)     (idle_thread_array[(x)])
+#define set_idle_for_cpu(x,p)   (idle_thread_array[(x)] = (p))
+
+/*
+ * cpu_possible_map should be static, it cannot change as cpu's
+ * are onlined, or offlined. The reason is per-cpu data-structures
+ * are allocated by some modules at init time, and dont expect to
+ * do this dynamically on cpu arrival/departure.
+ * cpu_present_map on the other hand can change dynamically.
+ * In case when cpu_hotplug is not compiled, then we resort to current
+ * behaviour, which is cpu_possible == cpu_present.
+ * If cpu-hotplug is supported, then we need to preallocate for all
+ * those NR_CPUS, hence cpu_possible_map represents entire NR_CPUS range.
+ * - Ashok Raj
+ */
+#ifdef CONFIG_HOTPLUG_CPU
+#define fixup_cpu_possible_map(x)	cpu_set((x), cpu_possible_map)
+#else
+#define fixup_cpu_possible_map(x)
+#endif
+
 /*
  * Currently trivial. Write the real->protected mode
  * bootstrap into the page concerned. The caller
@@ -445,8 +477,10 @@ void __cpuinit start_secondary(void)
 	/*
 	 * Allow the master to continue.
 	 */
+	lock_ipi_call_lock();
 	cpu_set(smp_processor_id(), cpu_online_map);
 	mb();
+	unlock_ipi_call_lock();
 
 	/* Wait for TSC sync to not schedule things before.
 	   We still process interrupts, which could see an inconsistent
@@ -623,33 +657,67 @@ static int __cpuinit wakeup_secondary_vi
 	return (send_status | accept_status);
 }
 
+struct create_idle {
+	struct task_struct *idle;
+	struct completion done;
+	int cpu;
+};
+
+void do_fork_idle(void *_c_idle)
+{
+	struct create_idle *c_idle = _c_idle;
+
+	c_idle->idle = fork_idle(c_idle->cpu);
+	complete(&c_idle->done);
+}
+
 /*
  * Boot one CPU.
  */
 static int __cpuinit do_boot_cpu(int cpu, int apicid)
 {
-	struct task_struct *idle;
 	unsigned long boot_error;
 	int timeout;
 	unsigned long start_rip;
-	/*
-	 * We can't use kernel_thread since we must avoid to
-	 * reschedule the child.
-	 */
-	idle = fork_idle(cpu);
-	if (IS_ERR(idle)) {
+	struct create_idle c_idle = {
+		.cpu = cpu,
+		.done = COMPLETION_INITIALIZER(c_idle.done),
+	};
+	DECLARE_WORK(work, do_fork_idle, &c_idle);
+
+	c_idle.idle = get_idle_for_cpu(cpu);
+
+	if (c_idle.idle) {
+		c_idle.idle->thread.rsp = (unsigned long) (((struct pt_regs *)
+			(THREAD_SIZE + (unsigned long) c_idle.idle->thread_info)) - 1);
+		init_idle(c_idle.idle, cpu);
+		goto do_rest;
+	}
+
+	if (!keventd_up() || current_is_keventd())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		wait_for_completion(&c_idle.done);
+	}
+
+	if (IS_ERR(c_idle.idle)) {
 		printk("failed fork for CPU %d\n", cpu);
-		return PTR_ERR(idle);
+		return PTR_ERR(c_idle.idle);
 	}
 
-	cpu_pda[cpu].pcurrent = idle;
+	set_idle_for_cpu(cpu, c_idle.idle);
+
+do_rest:
+
+	cpu_pda[cpu].pcurrent = c_idle.idle;
 
 	start_rip = setup_trampoline();
 
-	init_rsp = idle->thread.rsp;
+	init_rsp = c_idle.idle->thread.rsp;
 	per_cpu(init_tss,cpu).rsp0 = init_rsp;
 	initial_code = start_secondary;
-	clear_ti_thread_flag(idle->thread_info, TIF_FORK);
+	clear_ti_thread_flag(c_idle.idle->thread_info, TIF_FORK);
 
 	printk(KERN_INFO "Booting processor %d/%d rip %lx rsp %lx\n", cpu, apicid,
 	       start_rip, init_rsp);
@@ -925,10 +993,9 @@ void __init smp_prepare_cpus(unsigned in
 		int apicid = cpu_present_to_apicid(i);
 		if (physid_isset(apicid, phys_cpu_present_map)) {
 			cpu_set(i, cpu_present_map);
-			/* possible map would be different if we supported real
-			   CPU hotplug. */
 			cpu_set(i, cpu_possible_map);
 		}
+		fixup_cpu_possible_map(i);
 	}
 
 	if (smp_sanity_check(max_cpus) < 0) {
@@ -977,9 +1044,6 @@ void __init smp_prepare_boot_cpu(void)
 
 /*
  * Entry point to boot a CPU.
- *
- * This is all __cpuinit, not __devinit for now because we don't support
- * CPU hotplug (yet).
  */
 int __cpuinit __cpu_up(unsigned int cpu)
 {
@@ -996,6 +1060,14 @@ int __cpuinit __cpu_up(unsigned int cpu)
 		return -EINVAL;
 	}
 
+	/*
+	 * Already booted CPU?
+	 */
+ 	if (cpu_isset(cpu, cpu_callin_map)) {
+		Dprintk ("do_boot_cpu %d Already started\n", cpu);
+ 		return -ENOSYS;
+	}
+
 	/* Boot it! */
 	err = do_boot_cpu(cpu, apicid);
 	if (err < 0) {
@@ -1008,7 +1080,9 @@ int __cpuinit __cpu_up(unsigned int cpu)
 
 	while (!cpu_isset(cpu, cpu_online_map))
 		cpu_relax();
-	return 0;
+	err = 0;
+
+	return err;
 }
 
 /*
@@ -1016,7 +1090,9 @@ int __cpuinit __cpu_up(unsigned int cpu)
  */
 void __init smp_cpus_done(unsigned int max_cpus)
 {
+#ifndef CONFIG_HOTPLUG_CPU
 	zap_low_mappings();
+#endif
 	smp_cleanup_boot();
 
 #ifdef CONFIG_X86_IO_APIC
@@ -1028,3 +1104,94 @@ void __init smp_cpus_done(unsigned int m
 
 	check_nmi_watchdog();
 }
+
+#ifdef CONFIG_HOTPLUG_CPU
+
+static void
+remove_siblinginfo(int cpu)
+{
+	int sibling;
+
+	for_each_cpu_mask(sibling, cpu_sibling_map[cpu])
+		cpu_clear(cpu, cpu_sibling_map[sibling]);
+	for_each_cpu_mask(sibling, cpu_core_map[cpu])
+		cpu_clear(cpu, cpu_core_map[sibling]);
+	cpus_clear(cpu_sibling_map[cpu]);
+	cpus_clear(cpu_core_map[cpu]);
+	phys_proc_id[cpu] = BAD_APICID;
+	cpu_core_id[cpu] = BAD_APICID;
+}
+
+void remove_cpu_from_maps(void)
+{
+	int cpu = smp_processor_id();
+
+	cpu_clear(cpu, cpu_callout_map);
+	cpu_clear(cpu, cpu_callin_map);
+	clear_bit(cpu, &cpu_initialized); /* was set by cpu_init() */
+}
+
+int __cpu_disable(void)
+{
+	int cpu = smp_processor_id();
+
+	/*
+	 * Perhaps use cpufreq to drop frequency, but that could go
+	 * into generic code.
+ 	 *
+	 * We won't take down the boot processor on i386 due to some
+	 * interrupts only being able to be serviced by the BSP.
+	 * Especially so if we're not using an IOAPIC	-zwane
+	 */
+	if (cpu == 0)
+		return -EBUSY;
+
+	disable_APIC_timer();
+
+	/*
+	 * HACK:
+	 * Allow any queued timer interrupts to get serviced
+	 * This is only a temporary solution until we cleanup
+	 * fixup_irqs as we do for IA64.
+	 */
+	local_irq_enable();
+	mdelay(1);
+
+	local_irq_disable();
+	remove_siblinginfo(cpu);
+
+	/* It's now safe to remove this processor from the online map */
+	cpu_clear(cpu, cpu_online_map);
+	remove_cpu_from_maps();
+	fixup_irqs(cpu_online_map);
+	return 0;
+}
+
+void __cpu_die(unsigned int cpu)
+{
+	/* We don't do anything here: idle task is faking death itself. */
+	unsigned int i;
+
+	for (i = 0; i < 10; i++) {
+		/* They ack this in play_dead by setting CPU_DEAD */
+		if (per_cpu(cpu_state, cpu) == CPU_DEAD)
+			return;
+		current->state = TASK_UNINTERRUPTIBLE;
+		schedule_timeout(HZ/10);
+	}
+ 	printk(KERN_ERR "CPU %u didn't die...\n", cpu);
+}
+
+#else /* ... !CONFIG_HOTPLUG_CPU */
+
+int __cpu_disable(void)
+{
+	return -ENOSYS;
+}
+
+void __cpu_die(unsigned int cpu)
+{
+	/* We said "no" in __cpu_disable */
+	BUG();
+}
+#endif /* CONFIG_HOTPLUG_CPU */
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/irq.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/irq.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/irq.c
@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/seq_file.h>
 #include <linux/module.h>
+#include <linux/delay.h>
 #include <asm/uaccess.h>
 #include <asm/io_apic.h>
 
@@ -106,3 +107,32 @@ asmlinkage unsigned int do_IRQ(struct pt
 
 	return 1;
 }
+
+#ifdef CONFIG_HOTPLUG_CPU
+void fixup_irqs(cpumask_t map)
+{
+	unsigned int irq;
+	static int warned;
+
+	for (irq = 0; irq < NR_IRQS; irq++) {
+		cpumask_t mask;
+		if (irq == 2)
+			continue;
+
+		cpus_and(mask, irq_affinity[irq], map);
+		if (any_online_cpu(mask) == NR_CPUS) {
+			printk("Breaking affinity for irq %i\n", irq);
+			mask = map;
+		}
+		if (irq_desc[irq].handler->set_affinity)
+			irq_desc[irq].handler->set_affinity(irq, mask);
+		else if (irq_desc[irq].action && !(warned++))
+			printk("Cannot set affinity for irq %i\n", irq);
+	}
+
+	/* That doesn't seem sufficient.  Give it 1ms. */
+	local_irq_enable();
+	mdelay(1);
+	local_irq_disable();
+}
+#endif
Index: linux-2.6.12-rc5-mm2/include/asm-x86_64/irq.h
===================================================================
--- linux-2.6.12-rc5-mm2.orig/include/asm-x86_64/irq.h
+++ linux-2.6.12-rc5-mm2/include/asm-x86_64/irq.h
@@ -52,4 +52,9 @@ struct irqaction;
 struct pt_regs;
 int handle_IRQ_event(unsigned int, struct pt_regs *, struct irqaction *);
 
+#ifdef CONFIG_HOTPLUG_CPU
+#include <linux/cpumask.h>
+extern void fixup_irqs(cpumask_t map);
+#endif
+
 #endif /* _ASM_IRQ_H */
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/process.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/process.c
@@ -8,7 +8,8 @@
  * 
  *  X86-64 port
  *	Andi Kleen.
- * 
+ *
+ *	CPU hotplug support - ashok.raj@intel.com
  *  $Id: process.c,v 1.38 2002/01/15 10:08:03 ak Exp $
  */
 
@@ -18,6 +19,7 @@
 
 #include <stdarg.h>
 
+#include <linux/cpu.h>
 #include <linux/errno.h>
 #include <linux/sched.h>
 #include <linux/kernel.h>
@@ -154,6 +156,29 @@ void cpu_idle_wait(void)
 }
 EXPORT_SYMBOL_GPL(cpu_idle_wait);
 
+#ifdef CONFIG_HOTPLUG_CPU
+DECLARE_PER_CPU(int, cpu_state);
+
+#include <asm/nmi.h>
+/* We don't actually take CPU down, just spin without interrupts. */
+static inline void play_dead(void)
+{
+	idle_task_exit();
+	mb();
+	/* Ack it */
+	__get_cpu_var(cpu_state) = CPU_DEAD;
+
+	local_irq_disable();
+	while (1)
+		safe_halt();
+}
+#else
+static inline void play_dead(void)
+{
+	BUG();
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
 /*
  * The idle thread. There's no useful work to be
  * done, so just try to conserve power and have a
@@ -174,6 +199,8 @@ void cpu_idle (void)
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
+			if (cpu_is_offline(smp_processor_id()))
+				play_dead();
 			idle();
 		}
 
Index: linux-2.6.12-rc5-mm2/include/asm-x86_64/smp.h
===================================================================
--- linux-2.6.12-rc5-mm2.orig/include/asm-x86_64/smp.h
+++ linux-2.6.12-rc5-mm2/include/asm-x86_64/smp.h
@@ -44,6 +44,8 @@ extern void smp_alloc_memory(void);
 extern volatile unsigned long smp_invalidate_needed;
 extern int pic_mode;
 extern int smp_num_siblings;
+extern void lock_ipi_call_lock(void);
+extern void unlock_ipi_call_lock(void);
 extern void smp_flush_tlb(void);
 extern void smp_message_irq(int cpl, void *dev_id, struct pt_regs *regs);
 extern void smp_send_reschedule(int cpu);
@@ -77,6 +79,8 @@ extern __inline int hard_smp_processor_i
 }
 
 extern int safe_smp_processor_id(void);
+extern int __cpu_disable(void);
+extern void __cpu_die(unsigned int cpu);
 
 #endif /* !ASSEMBLY */
 
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/traps.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/traps.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/traps.c
@@ -589,11 +589,17 @@ static void unknown_nmi_error(unsigned c
 asmlinkage void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
+	int cpu;
+
+	cpu = smp_processor_id();
 
 	/* Only the BSP gets external NMIs from the system.  */
-	if (!smp_processor_id())
+	if (!cpu)
 		reason = get_nmi_reason();
 
+	if (!cpu_online(cpu))
+		return;
+
 	if (!(reason & 0xc0)) {
 		if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 0, SIGINT)
 								== NOTIFY_STOP)
Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smp.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/smp.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smp.c
@@ -295,6 +295,18 @@ struct call_data_struct {
 
 static struct call_data_struct * call_data;
 
+void
+lock_ipi_call_lock(void)
+{
+	spin_lock_irq(&call_lock);
+}
+
+void
+unlock_ipi_call_lock(void)
+{
+	spin_unlock_irq(&call_lock);
+}
+
 /*
  * this function sends a 'generic call function' IPI to all other CPUs
  * in the system.
Index: linux-2.6.12-rc5-mm2/arch/i386/mach-default/topology.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/i386/mach-default/topology.c
+++ linux-2.6.12-rc5-mm2/arch/i386/mach-default/topology.c
@@ -73,12 +73,11 @@ static int __init topology_init(void)
 {
 	int i;
 
-	for (i = 0; i < MAX_NUMNODES; i++) {
-		if (node_online(i))
-			arch_register_node(i);
-	}
-	for (i = 0; i < NR_CPUS; i++)
-		if (cpu_possible(i)) arch_register_cpu(i);
+	for_each_online_node(i)
+		arch_register_node(i);
+
+	for_each_cpu(i)
+		arch_register_cpu(i);
 	return 0;
 }
 
@@ -88,8 +87,8 @@ static int __init topology_init(void)
 {
 	int i;
 
-	for (i = 0; i < NR_CPUS; i++)
-		if (cpu_possible(i)) arch_register_cpu(i);
+	for_each_cpu(i)
+		arch_register_cpu(i);
 	return 0;
 }
 

--


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 3/5] x86_64: CPU hotplug sibling map cleanup
  2005-06-02 12:57 [patch 0/5] x86_64 CPU hotplug patch series Ashok Raj
  2005-06-02 12:57 ` [patch 1/5] x86_64: Change init sections for CPU hotplug support Ashok Raj
  2005-06-02 12:57 ` [patch 2/5] x86_64: " Ashok Raj
@ 2005-06-02 12:57 ` Ashok Raj
  2005-06-02 12:57 ` [patch 4/5] x86_64: Dont use broadcast shortcut to make it cpu hotplug safe Ashok Raj
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 12:57 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton
  Cc: linux-kernel, discuss, Rusty Russell, Srivattsa Vaddagiri,
	Zwane Mwaikambo, Ashok Raj

[-- Attachment #1: x86_64-sibling-map-fixup.patch --]
[-- Type: text/plain, Size: 3258 bytes --]

This patch is a minor cleanup to the cpu sibling/core map.
It is required that this setup happens on a per-cpu bringup
time.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
------------------------------
 arch/x86_64/kernel/smpboot.c |   82 ++++++++++++++++++-------------------------
 1 files changed, 36 insertions(+), 46 deletions(-)

Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/smpboot.c
@@ -445,6 +445,34 @@ void __cpuinit smp_callin(void)
 	cpu_set(cpuid, cpu_callin_map);
 }
 
+static inline void
+set_cpu_sibling_map(int cpu)
+{
+	int i;
+
+	if (smp_num_siblings > 1) {
+		for_each_cpu(i) {
+			if (cpu_core_id[cpu] == cpu_core_id[i]) {
+				cpu_set(i, cpu_sibling_map[cpu]);
+				cpu_set(cpu, cpu_sibling_map[i]);
+			}
+		}
+	} else {
+		cpu_set(cpu, cpu_sibling_map[cpu]);
+	}
+
+	if (current_cpu_data.x86_num_cores > 1) {
+		for_each_cpu(i) {
+			if (phys_proc_id[cpu] == phys_proc_id[i]) {
+				cpu_set(i, cpu_core_map[cpu]);
+				cpu_set(cpu, cpu_core_map[i]);
+			}
+		}
+	} else {
+		cpu_core_map[cpu] = cpu_sibling_map[cpu];
+	}
+}
+
 /*
  * Setup code on secondary processor (after comming out of the trampoline)
  */
@@ -475,6 +503,12 @@ void __cpuinit start_secondary(void)
 	enable_APIC_timer();
 
 	/*
+	 * The sibling maps must be set before turing the online map on for
+	 * this cpu
+	 */
+	set_cpu_sibling_map(smp_processor_id());
+
+	/*
 	 * Allow the master to continue.
 	 */
 	lock_ipi_call_lock();
@@ -809,51 +843,6 @@ cycles_t cacheflush_time;
 unsigned long cache_decay_ticks;
 
 /*
- * Construct cpu_sibling_map[], so that we can tell the sibling CPU
- * on SMT systems efficiently.
- */
-static __cpuinit void detect_siblings(void)
-{
-	int cpu;
-
-	for (cpu = 0; cpu < NR_CPUS; cpu++) {
-		cpus_clear(cpu_sibling_map[cpu]);
-		cpus_clear(cpu_core_map[cpu]);
-	}
-
-	for_each_online_cpu (cpu) {
-		struct cpuinfo_x86 *c = cpu_data + cpu;
-		int siblings = 0;
-		int i;
-		if (smp_num_siblings > 1) {
-			for_each_online_cpu (i) {
-				if (cpu_core_id[cpu] == cpu_core_id[i]) {
-					siblings++;
-					cpu_set(i, cpu_sibling_map[cpu]);
-				}
-			}
-		} else {
-			siblings++;
-			cpu_set(cpu, cpu_sibling_map[cpu]);
-		}
-
-		if (siblings != smp_num_siblings) {
-			printk(KERN_WARNING
-	       "WARNING: %d siblings found for CPU%d, should be %d\n",
-			       siblings, cpu, smp_num_siblings);
-			smp_num_siblings = siblings;
-		}
-		if (c->x86_num_cores > 1) {
-			for_each_online_cpu(i) {
-				if (phys_proc_id[cpu] == phys_proc_id[i])
-					cpu_set(i, cpu_core_map[cpu]);
-			}
-		} else
-			cpu_core_map[cpu] = cpu_sibling_map[cpu];
-	}
-}
-
-/*
  * Cleanup possible dangling ends...
  */
 static __cpuinit void smp_cleanup_boot(void)
@@ -1040,6 +1029,8 @@ void __init smp_prepare_boot_cpu(void)
 	int me = smp_processor_id();
 	cpu_set(me, cpu_online_map);
 	cpu_set(me, cpu_callout_map);
+	cpu_set(0, cpu_sibling_map[0]);
+	cpu_set(0, cpu_core_map[0]);
 }
 
 /*
@@ -1099,7 +1090,6 @@ void __init smp_cpus_done(unsigned int m
 	setup_ioapic_dest();
 #endif
 
-	detect_siblings();
 	time_init_gtod();
 
 	check_nmi_watchdog();

--


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 4/5] x86_64: Dont use broadcast shortcut to make it cpu hotplug safe.
  2005-06-02 12:57 [patch 0/5] x86_64 CPU hotplug patch series Ashok Raj
                   ` (2 preceding siblings ...)
  2005-06-02 12:57 ` [patch 3/5] x86_64: CPU hotplug sibling map cleanup Ashok Raj
@ 2005-06-02 12:57 ` Ashok Raj
  2005-06-02 12:58 ` [patch 5/5] x86_64: Provide ability to choose using shortcuts for IPI in flat mode Ashok Raj
  2005-06-02 20:25 ` [patch 0/5] x86_64 CPU hotplug patch series Zwane Mwaikambo
  5 siblings, 0 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 12:57 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton
  Cc: linux-kernel, discuss, Rusty Russell, Srivattsa Vaddagiri,
	Zwane Mwaikambo, Ashok Raj

[-- Attachment #1: no_broadcast_ipi.patch --]
[-- Type: text/plain, Size: 3179 bytes --]

[Andi doesn't like this simpler approach, and prefers to continue to use
 broadcast mode, with a more complex __cpu_up() to overcome the ill effects
 of broadcast. Hence based on his request the next patch (following this) 
 is provided to provide a run-time switch capability]

Broadcast IPI's provide un-expected behaviour for cpu hotplug. CPU's in offline
state also end up receiving the IPI. Once the cpus become online
they receive these stale IPI's which are bad and introduce unexpected
behaviour. 

This is easily avoided by not sending a broadcast and addressing just the 
CPU's in online map.  Doing prelim cycle counts it appears there is no big 
overhead and numbers seem around 0x3000-0x3900 on an average on x86 and x86_64 
systems with CPUS running 3G, both for broadcast and mask version of the API's. 

The shortcuts are useful only for flat mode (where the perf shows no 
degradation), and in cluster mode, its unicast anyway. Its simpler 
to just not use broadcast anymore.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
-----------------------------------------
 arch/x86_64/kernel/genapic_flat.c |   41 +++++++++++++++++++++++---------------
 1 files changed, 25 insertions(+), 16 deletions(-)

Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/genapic_flat.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/genapic_flat.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/genapic_flat.c
@@ -7,6 +7,8 @@
  * Hacked for x86-64 by James Cleverdon from i386 architecture code by
  * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and
  * James Cleverdon.
+ * Ashok Raj <ashok.raj@intel.com>
+ * 	Removed IPI broadcast shortcut to support CPU hotplug
  */
 #include <linux/config.h>
 #include <linux/threads.h>
@@ -45,22 +47,6 @@ static void flat_init_apic_ldr(void)
 	apic_write_around(APIC_LDR, val);
 }
 
-static void flat_send_IPI_allbutself(int vector)
-{
-	/*
-	 * if there are no other CPUs in the system then
-	 * we get an APIC send error if we try to broadcast.
-	 * thus we have to avoid sending IPIs in this case.
-	 */
-	if (num_online_cpus() > 1)
-		__send_IPI_shortcut(APIC_DEST_ALLBUT, vector, APIC_DEST_LOGICAL);
-}
-
-static void flat_send_IPI_all(int vector)
-{
-	__send_IPI_shortcut(APIC_DEST_ALLINC, vector, APIC_DEST_LOGICAL);
-}
-
 static void flat_send_IPI_mask(cpumask_t cpumask, int vector)
 {
 	unsigned long mask = cpus_addr(cpumask)[0];
@@ -93,6 +79,29 @@ static void flat_send_IPI_mask(cpumask_t
 	local_irq_restore(flags);
 }
 
+static void flat_send_IPI_allbutself(int vector)
+{
+	cpumask_t mask;
+	/*
+	 * if there are no other CPUs in the system then
+	 * we get an APIC send error if we try to broadcast.
+	 * thus we have to avoid sending IPIs in this case.
+	 */
+	get_cpu();
+	mask = cpu_online_map;
+	cpu_clear(smp_processor_id(), mask);
+
+	if (cpus_weight(mask) >= 1)
+		flat_send_IPI_mask(mask, vector);
+
+	put_cpu();
+}
+
+static void flat_send_IPI_all(int vector)
+{
+	flat_send_IPI_mask(cpu_online_map, vector);
+}
+
 static int flat_apic_id_registered(void)
 {
 	return physid_isset(GET_APIC_ID(apic_read(APIC_ID)), phys_cpu_present_map);

--


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 5/5] x86_64: Provide ability to choose using shortcuts for IPI in flat mode.
  2005-06-02 12:57 [patch 0/5] x86_64 CPU hotplug patch series Ashok Raj
                   ` (3 preceding siblings ...)
  2005-06-02 12:57 ` [patch 4/5] x86_64: Dont use broadcast shortcut to make it cpu hotplug safe Ashok Raj
@ 2005-06-02 12:58 ` Ashok Raj
  2005-06-02 20:10   ` Zwane Mwaikambo
  2005-06-02 20:25 ` [patch 0/5] x86_64 CPU hotplug patch series Zwane Mwaikambo
  5 siblings, 1 reply; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 12:58 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton
  Cc: linux-kernel, discuss, Rusty Russell, Srivattsa Vaddagiri,
	Zwane Mwaikambo, Ashok Raj

[-- Attachment #1: choose_mask_or_broadcast.patch --]
[-- Type: text/plain, Size: 4779 bytes --]

This patch provides an option to switch broadcast or use mask version 
for sending IPI's. If CONFIG_HOTPLUG_CPU is defined, we choose not to 
use broadcast shortcuts by default, otherwise we choose broadcast mode
as default.

both cases, one can change this via startup cmd line option, to choose
no-broadcast mode.

	no_ipi_broadcast=1

This is provided on request from Andi Kleen, since he doesnt agree with 
replacing IPI shortcuts as a solution for CPU hotplug. Without removing
broadcast IPI's, it would mean lots of new code for __cpu_up() path, 
which would acheive the same results.

I will send the primitive interface next as a separate patch from this
patch set.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
----------------------------------------------------
 arch/x86_64/kernel/genapic_flat.c |   84 ++++++++++++++++++++++++++++++++++++--
 1 files changed, 80 insertions(+), 4 deletions(-)

Index: linux-2.6.12-rc5-mm2/arch/x86_64/kernel/genapic_flat.c
===================================================================
--- linux-2.6.12-rc5-mm2.orig/arch/x86_64/kernel/genapic_flat.c
+++ linux-2.6.12-rc5-mm2/arch/x86_64/kernel/genapic_flat.c
@@ -20,6 +20,46 @@
 #include <asm/smp.h>
 #include <asm/ipi.h>
 
+/*
+ * The following permit choosing broadcast IPI shortcut v.s sending IPI only
+ * to online cpus via the send_IPI_mask varient.
+ * The mask version is my preferred option, since it eliminates a lot of
+ * other extra code that would need to be written to cleanup intrs sent
+ * to a CPU while offline.
+ *
+ * Sending broadcast introduces lots of trouble in CPU hotplug situations.
+ * These IPI's are delivered to cpu's irrespective of their offline status
+ * and could pickup stale intr data when these CPUS are turned online.
+ *
+ * Not using broadcast is a cleaner approach IMO, but Andi Kleen disagrees with
+ * the idea of not using broadcast IPI's anymore. Hence the run time check
+ * is introduced, on his request so we can choose an alternate mechanism.
+ *
+ * Initial wacky performance tests that collect cycle counts show
+ * no increase in using mask v.s broadcast version. In fact they seem
+ * identical in terms of cycle counts.
+ *
+ * if we need to use broadcast, we need to do the following.
+ *
+ * cli;
+ * hold call_lock;
+ * clear any pending IPI, just ack and clear all pending intr
+ * set cpu_online_map;
+ * release call_lock;
+ * sti;
+ *
+ * The complicated dummy irq processing shown above is not required if
+ * we didnt sent IPI's to wrong CPU's in the first place.
+ *
+ * - Ashok Raj <ashok.raj@intel.com>
+ */
+#ifdef CONFIG_HOTPLUG_CPU
+#define DEFAULT_SEND_IPI	(1)
+#else
+#define DEFAULT_SEND_IPI	(0)
+#endif
+
+static int no_broadcast=DEFAULT_SEND_IPI;
 
 static cpumask_t flat_target_cpus(void)
 {
@@ -79,27 +119,44 @@ static void flat_send_IPI_mask(cpumask_t
 	local_irq_restore(flags);
 }
 
+static inline void __local_flat_send_IPI_allbutself(cpumask_t mask, int vector)
+{
+	if (no_broadcast)
+		flat_send_IPI_mask(mask, vector);
+	else
+		__send_IPI_shortcut(APIC_DEST_ALLBUT, vector, APIC_DEST_LOGICAL);
+}
+
+static inline void __local_flat_send_IPI_all(cpumask_t mask, int vector)
+{
+	if (no_broadcast)
+		flat_send_IPI_mask(mask, vector);
+	else
+		__send_IPI_shortcut(APIC_DEST_ALLINC, vector, APIC_DEST_LOGICAL);
+}
+
 static void flat_send_IPI_allbutself(int vector)
 {
 	cpumask_t mask;
+	int this_cpu;
 	/*
 	 * if there are no other CPUs in the system then
 	 * we get an APIC send error if we try to broadcast.
 	 * thus we have to avoid sending IPIs in this case.
 	 */
-	get_cpu();
+	this_cpu = get_cpu();
 	mask = cpu_online_map;
-	cpu_clear(smp_processor_id(), mask);
+	cpu_clear(this_cpu, mask);
 
 	if (cpus_weight(mask) >= 1)
-		flat_send_IPI_mask(mask, vector);
+		__local_flat_send_IPI_allbutself(mask, vector);
 
 	put_cpu();
 }
 
 static void flat_send_IPI_all(int vector)
 {
-	flat_send_IPI_mask(cpu_online_map, vector);
+	__local_flat_send_IPI_all(cpu_online_map, vector);
 }
 
 static int flat_apic_id_registered(void)
@@ -120,6 +177,16 @@ static unsigned int phys_pkg_id(int inde
 	return ((ebx >> 24) & 0xFF) >> index_msb;
 }
 
+static __init int no_ipi_broadcast(char *str)
+{
+	get_option(&str, &no_broadcast);
+	printk ("Using %s mode\n", no_broadcast ? "No IPI Broadcast" :
+											"IPI Broadcast");
+	return 1;
+}
+
+__setup("no_ipi_broadcast", no_ipi_broadcast);
+
 struct genapic apic_flat =  {
 	.name = "flat",
 	.int_delivery_mode = dest_LowestPrio,
@@ -134,3 +201,12 @@ struct genapic apic_flat =  {
 	.cpu_mask_to_apicid = flat_cpu_mask_to_apicid,
 	.phys_pkg_id = phys_pkg_id,
 };
+
+int print_ipi_mode(void)
+{
+	printk ("Using IPI %s mode\n", no_broadcast ? "No-Shortcut" :
+											"Shortcut");
+	return 0;
+}
+
+late_initcall(print_ipi_mode);

--


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 5/5] x86_64: Provide ability to choose using shortcuts for IPI in flat mode.
  2005-06-02 12:58 ` [patch 5/5] x86_64: Provide ability to choose using shortcuts for IPI in flat mode Ashok Raj
@ 2005-06-02 20:10   ` Zwane Mwaikambo
  2005-06-02 23:15     ` Ashok Raj
  0 siblings, 1 reply; 19+ messages in thread
From: Zwane Mwaikambo @ 2005-06-02 20:10 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Andi Kleen, Andrew Morton, linux-kernel, discuss, Rusty Russell,
	Srivattsa Vaddagiri

On Thu, 2 Jun 2005, Ashok Raj wrote:

> This patch provides an option to switch broadcast or use mask version 
> for sending IPI's. If CONFIG_HOTPLUG_CPU is defined, we choose not to 
> use broadcast shortcuts by default, otherwise we choose broadcast mode
> as default.
> 
> both cases, one can change this via startup cmd line option, to choose
> no-broadcast mode.
> 
> 	no_ipi_broadcast=1
> 
> This is provided on request from Andi Kleen, since he doesnt agree with 
> replacing IPI shortcuts as a solution for CPU hotplug. Without removing
> broadcast IPI's, it would mean lots of new code for __cpu_up() path, 
> which would acheive the same results.
> 
> I will send the primitive interface next as a separate patch from this
> patch set.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>

Have you already submitted the i386 version? I think it's a sane fix.

Thanks,
	Zwane

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 1/5] x86_64: Change init sections for CPU hotplug support
  2005-06-02 12:57 ` [patch 1/5] x86_64: Change init sections for CPU hotplug support Ashok Raj
@ 2005-06-02 20:14   ` Zwane Mwaikambo
  2005-06-02 23:19     ` Ashok Raj
  0 siblings, 1 reply; 19+ messages in thread
From: Zwane Mwaikambo @ 2005-06-02 20:14 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Andi Kleen, Andrew Morton, linux-kernel, discuss, Rusty Russell,
	Srivattsa Vaddagiri

On Thu, 2 Jun 2005, Ashok Raj wrote:

> This patch adds __cpuinit and __cpuinitdata sections that need to exist
> past boot to support cpu hotplug.
> 
> Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
> Andi Kleen. Much of the generic hotplug code in kernel, and none of the 
> other archs that support CPU hotplug today, i386, ia64, ppc64, s390 and
> parisc dont mark sections with __cpuinit, but only mark them as __devinit, 
> and __devinitdata.
> 
> If someone is motivated to change generic code, we need to make sure all
> existing hotplug code does not break, on other arch's that dont use 
> __cpuinit, and __cpudevinit.

I'll do i386.

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/5] x86_64: CPU hotplug support.
  2005-06-02 12:57 ` [patch 2/5] x86_64: " Ashok Raj
@ 2005-06-02 20:19   ` Zwane Mwaikambo
  2005-06-02 23:33     ` Ashok Raj
  0 siblings, 1 reply; 19+ messages in thread
From: Zwane Mwaikambo @ 2005-06-02 20:19 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Andi Kleen, Andrew Morton, linux-kernel, discuss, Rusty Russell,
	Srivattsa Vaddagiri

On Thu, 2 Jun 2005, Ashok Raj wrote:

> @@ -445,8 +477,10 @@ void __cpuinit start_secondary(void)
>  	/*
>  	 * Allow the master to continue.
>  	 */
> +	lock_ipi_call_lock();
>  	cpu_set(smp_processor_id(), cpu_online_map);
>  	mb();
> +	unlock_ipi_call_lock();

What's that? Is this another smp_call_function race workaround? I thought 
there was an additional patch to avoid the broadcast.

> +#include <asm/nmi.h>
> +/* We don't actually take CPU down, just spin without interrupts. */
> +static inline void play_dead(void)
> +{
> +	idle_task_exit();
> +	mb();
> +	/* Ack it */
> +	__get_cpu_var(cpu_state) = CPU_DEAD;
> +
> +	local_irq_disable();
> +	while (1)
> +		safe_halt();
> +}

Might as well drop the local_irq_disable since safe_halt enables 
interrupts.

	Zwane


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 0/5] x86_64 CPU hotplug patch series.
  2005-06-02 12:57 [patch 0/5] x86_64 CPU hotplug patch series Ashok Raj
                   ` (4 preceding siblings ...)
  2005-06-02 12:58 ` [patch 5/5] x86_64: Provide ability to choose using shortcuts for IPI in flat mode Ashok Raj
@ 2005-06-02 20:25 ` Zwane Mwaikambo
  2005-06-03 16:35   ` Andi Kleen
  5 siblings, 1 reply; 19+ messages in thread
From: Zwane Mwaikambo @ 2005-06-02 20:25 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Andi Kleen, Andrew Morton, linux-kernel, discuss, Rusty Russell,
	Srivattsa Vaddagiri

On Thu, 2 Jun 2005, Ashok Raj wrote:

> Andrew: Could you help test staging in -mm so we can get some wider testing
> from those interested.
> 
> *Sore Point*: Andi doesnt agree with one patch that removes ipi-broadcast 
> and uses only online map cpus receive IPI's. This is much simpler approach to 
> handle instead of trying to remove the ill effects of IPI broadcast to CPUs in 
> offline state.
> 
> Initial concern from Andi was IPI performance, but some primitive test with a 
> good number of samples doesnt seem to indicate any degration at all, infact the
> results seem identical. (Barring any operator errors :-( ).
> 
> It would be nice to hear other opinions as well, hopefuly we can close on
> what what the right approach in this case. Link to an earlier discussion
> on the topic.

I don't think it's worth the extra boot time complexity to use the boot 
workaround and i'm not convinced the extra mask against cpu_online_map 
slows down that path enough to show up compared to waiting for remote 
processor IPI handling to commence/complete.

Thanks,
	Zwane

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 5/5] x86_64: Provide ability to choose using shortcuts for IPI in flat mode.
  2005-06-02 20:10   ` Zwane Mwaikambo
@ 2005-06-02 23:15     ` Ashok Raj
  0 siblings, 0 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 23:15 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Ashok Raj, Andi Kleen, Andrew Morton, linux-kernel, discuss,
	Rusty Russell, Srivattsa Vaddagiri

On Thu, Jun 02, 2005 at 02:10:47PM -0600, Zwane Mwaikambo wrote:
> On Thu, 2 Jun 2005, Ashok Raj wrote:
> 
> > This patch provides an option to switch broadcast or use mask version 
> > for sending IPI's. If CONFIG_HOTPLUG_CPU is defined, we choose not to 
> > use broadcast shortcuts by default, otherwise we choose broadcast mode
> > as default.
> > 
> > both cases, one can change this via startup cmd line option, to choose
> > no-broadcast mode.
> > 
> > 	no_ipi_broadcast=1
> > 
> Have you already submitted the i386 version? I think it's a sane fix.


I haven't submitted for i386 yet, i can do the same there as well for flat
mode. I will send one over next...

-- 
Cheers,
Ashok Raj
- Open Source Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 1/5] x86_64: Change init sections for CPU hotplug support
  2005-06-02 20:14   ` Zwane Mwaikambo
@ 2005-06-02 23:19     ` Ashok Raj
  0 siblings, 0 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 23:19 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Ashok Raj, Andi Kleen, Andrew Morton, linux-kernel, discuss,
	Rusty Russell, Srivattsa Vaddagiri

On Thu, Jun 02, 2005 at 02:14:28PM -0600, Zwane Mwaikambo wrote:
> On Thu, 2 Jun 2005, Ashok Raj wrote:
> 
> > This patch adds __cpuinit and __cpuinitdata sections that need to exist
> > past boot to support cpu hotplug.
> > 
> > Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
> > Andi Kleen. Much of the generic hotplug code in kernel, and none of the 
> > other archs that support CPU hotplug today, i386, ia64, ppc64, s390 and
> > parisc dont mark sections with __cpuinit, but only mark them as __devinit, 
> > and __devinitdata.
> > 
> > If someone is motivated to change generic code, we need to make sure all
> > existing hotplug code does not break, on other arch's that dont use 
> > __cpuinit, and __cpudevinit.
> 
> I'll do i386.
> 

The generic kernel pieces also need to be converted, which is probably the 
bigger chunk of code that could have been marked __cpuinit in cases where
hotplug is not supported.

I can do the ia64 pieces...

-- 
Cheers,
Ashok Raj
- Open Source Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/5] x86_64: CPU hotplug support.
  2005-06-02 20:19   ` Zwane Mwaikambo
@ 2005-06-02 23:33     ` Ashok Raj
  2005-06-02 23:45       ` Zwane Mwaikambo
  2005-06-03  2:01       ` Shaohua Li
  0 siblings, 2 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-02 23:33 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Ashok Raj, Andi Kleen, Andrew Morton, linux-kernel, discuss,
	Rusty Russell, Srivattsa Vaddagiri

On Thu, Jun 02, 2005 at 02:19:55PM -0600, Zwane Mwaikambo wrote:
> On Thu, 2 Jun 2005, Ashok Raj wrote:
> 
> > @@ -445,8 +477,10 @@ void __cpuinit start_secondary(void)
> >  	/*
> >  	 * Allow the master to continue.
> >  	 */
> > +	lock_ipi_call_lock();
> >  	cpu_set(smp_processor_id(), cpu_online_map);
> >  	mb();
> > +	unlock_ipi_call_lock();
> 
> What's that? Is this another smp_call_function race workaround? I thought 
> there was an additional patch to avoid the broadcast.

The other patch avoids sending to offline cpu's, but we read cpu_online_map
and clear self bit in smp_call_function. If a cpu comes online, dont we 
want this cpu to take part in smp_call_function?

if we dont care about this new CPU participating, and if cpu_set() is atomic
(for all NR_CPUS) we dont need to hold call_lock, otherwise we need to hold
this as well.

> 
> > +#include <asm/nmi.h>
> > +/* We don't actually take CPU down, just spin without interrupts. */
> > +static inline void play_dead(void)
> > +{
> > +	idle_task_exit();
> > +	mb();
> > +	/* Ack it */
> > +	__get_cpu_var(cpu_state) = CPU_DEAD;
> > +
> > +	local_irq_disable();
> > +	while (1)
> > +		safe_halt();
> > +}
> 
> Might as well drop the local_irq_disable since safe_halt enables 
> interrupts.
> 

Will do. I think we could cleanup more of the cpu state before we do safe_halt.

But i think i need to look how much more state can be cleaned out. 

In Ia64, we have a clear SAL handoff state, and essentially the cpu is handed
off as OS got it back to BIOS land. Althoght that might be a hard goal 
i need to look at what states _must_ be cleaned out as next step.


-- 
Cheers,
Ashok Raj
- Open Source Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/5] x86_64: CPU hotplug support.
  2005-06-02 23:33     ` Ashok Raj
@ 2005-06-02 23:45       ` Zwane Mwaikambo
  2005-06-03  0:08         ` Ashok Raj
  2005-06-03  2:01       ` Shaohua Li
  1 sibling, 1 reply; 19+ messages in thread
From: Zwane Mwaikambo @ 2005-06-02 23:45 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Andi Kleen, Andrew Morton, linux-kernel, discuss, Rusty Russell,
	Srivattsa Vaddagiri

On Thu, 2 Jun 2005, Ashok Raj wrote:

> > > +	lock_ipi_call_lock();
> > >  	cpu_set(smp_processor_id(), cpu_online_map);
> > >  	mb();
> > > +	unlock_ipi_call_lock();
> > 
> > What's that? Is this another smp_call_function race workaround? I thought 
> > there was an additional patch to avoid the broadcast.
> 
> The other patch avoids sending to offline cpu's, but we read cpu_online_map
> and clear self bit in smp_call_function. If a cpu comes online, dont we 
> want this cpu to take part in smp_call_function?

The lock being held in smp_call_function whilst we access cpu_online_map 
should prevent another processor coming online within that operation 
shouldn't it? So There shouldn't be any processors coming online except 
for right after or before an smp_call_function.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/5] x86_64: CPU hotplug support.
  2005-06-02 23:45       ` Zwane Mwaikambo
@ 2005-06-03  0:08         ` Ashok Raj
  0 siblings, 0 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-03  0:08 UTC (permalink / raw)
  To: Zwane Mwaikambo, y
  Cc: Ashok Raj, Andi Kleen, Andrew Morton, linux-kernel, discuss,
	Rusty Russell, Srivattsa Vaddagiri

On Thu, Jun 02, 2005 at 05:45:14PM -0600, Zwane Mwaikambo wrote:
> On Thu, 2 Jun 2005, Ashok Raj wrote:
> 
> > > > +	lock_ipi_call_lock();
> > > >  	cpu_set(smp_processor_id(), cpu_online_map);
> > > >  	mb();
> > > > +	unlock_ipi_call_lock();
> > > 
> > > What's that? Is this another smp_call_function race workaround? I thought 
> > > there was an additional patch to avoid the broadcast.
> > 
> > The other patch avoids sending to offline cpu's, but we read cpu_online_map
> > and clear self bit in smp_call_function. If a cpu comes online, dont we 
> > want this cpu to take part in smp_call_function?
> 
> The lock being held in smp_call_function whilst we access cpu_online_map 
> should prevent another processor coming online within that operation 
> shouldn't it? So There shouldn't be any processors coming online except 
> for right after or before an smp_call_function.

precicely why we hold the same lock when we set the bit in cpu_online_map
during cpu_up as well.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/5] x86_64: CPU hotplug support.
  2005-06-02 23:33     ` Ashok Raj
  2005-06-02 23:45       ` Zwane Mwaikambo
@ 2005-06-03  2:01       ` Shaohua Li
  2005-06-03 14:25         ` Ashok Raj
  1 sibling, 1 reply; 19+ messages in thread
From: Shaohua Li @ 2005-06-03  2:01 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Zwane Mwaikambo, ak, akpm, lkml, x86-64, Rusty Russell,
	Srivattsa Vaddagiri

On Thu, 2005-06-02 at 16:33 -0700, Ashok Raj wrote:
> On Thu, Jun 02, 2005 at 02:19:55PM -0600, Zwane Mwaikambo wrote:
> > On Thu, 2 Jun 2005, Ashok Raj wrote:
> > 
> > > @@ -445,8 +477,10 @@ void __cpuinit start_secondary(void)
> > >  	/*
> > >  	 * Allow the master to continue.
> > >  	 */
> > > +	lock_ipi_call_lock();
> > >  	cpu_set(smp_processor_id(), cpu_online_map);
> > >  	mb();
> > > +	unlock_ipi_call_lock();
> > 
> > What's that? Is this another smp_call_function race workaround? I thought 
> > there was an additional patch to avoid the broadcast.
> 
> The other patch avoids sending to offline cpu's, but we read cpu_online_map
> and clear self bit in smp_call_function. If a cpu comes online, dont we 
> want this cpu to take part in smp_call_function?
> 
> if we dont care about this new CPU participating, and if cpu_set() is atomic
> (for all NR_CPUS) we dont need to hold call_lock, otherwise we need to hold
> this as well.
If a CPU isn't online, why should it participates it? If it should
participate it, it also might do the similar thing before set cpu
online.
Some places which really care about it such as smp_send_stop should hold
cpucontrol semaphore to me.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/5] x86_64: CPU hotplug support.
  2005-06-03  2:01       ` Shaohua Li
@ 2005-06-03 14:25         ` Ashok Raj
  0 siblings, 0 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-03 14:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Ashok Raj, Zwane Mwaikambo, ak, akpm, lkml, x86-64,
	Rusty Russell, Srivattsa Vaddagiri

On Fri, Jun 03, 2005 at 10:01:55AM +0800, Shaohua Li wrote:
> On Thu, 2005-06-02 at 16:33 -0700, Ashok Raj wrote:
> > On Thu, Jun 02, 2005 at 02:19:55PM -0600, Zwane Mwaikambo wrote:
> > > On Thu, 2 Jun 2005, Ashok Raj wrote:
> > > 
> > > > @@ -445,8 +477,10 @@ void __cpuinit start_secondary(void)
> > > >  	/*
> > > >  	 * Allow the master to continue.
> > > >  	 */
> > > > +	lock_ipi_call_lock();
> > > >  	cpu_set(smp_processor_id(), cpu_online_map);
> > > >  	mb();
> > > > +	unlock_ipi_call_lock();
> > > 
> > > What's that? Is this another smp_call_function race workaround? I thought 
> > > there was an additional patch to avoid the broadcast.
> > 
> > The other patch avoids sending to offline cpu's, but we read cpu_online_map
> > and clear self bit in smp_call_function. If a cpu comes online, dont we 
> > want this cpu to take part in smp_call_function?
> > 
> > if we dont care about this new CPU participating, and if cpu_set() is atomic
> > (for all NR_CPUS) we dont need to hold call_lock, otherwise we need to hold
> > this as well.
> If a CPU isn't online, why should it participates it? If it should
> participate it, it also might do the similar thing before set cpu
> online.

Good point. I was just trying to include the just arrived cpu, in the 
set, but i can convince myself that this would be any real value to include
this newly arrived cpu in that case. I can drop it.

> Some places which really care about it such as smp_send_stop should hold
> cpucontrol semaphore to me.

panic() ends up calling smp_send_stop(), i dont think we could hold a sema
in that path if we end up calling from intr context.

probably from callers of stop_machine/restart we could add lock_cpu_hotplug
not too sure how useful that would be though.
> 
> Thanks,
> Shaohua
> 

-- 
Cheers,
Ashok Raj
- Open Source Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 0/5] x86_64 CPU hotplug patch series.
  2005-06-02 20:25 ` [patch 0/5] x86_64 CPU hotplug patch series Zwane Mwaikambo
@ 2005-06-03 16:35   ` Andi Kleen
  2005-06-03 17:15     ` Ashok Raj
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2005-06-03 16:35 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Andrew Morton, linux-kernel, discuss, Rusty Russell,
	Srivattsa Vaddagiri, ashok.raj

Zwane Mwaikambo <zwane@arm.linux.org.uk> writes:

> On Thu, 2 Jun 2005, Ashok Raj wrote:
>
>> Andrew: Could you help test staging in -mm so we can get some wider testing
>> from those interested.
>> 
>> *Sore Point*: Andi doesnt agree with one patch that removes ipi-broadcast 
>> and uses only online map cpus receive IPI's. This is much simpler approach to 
>> handle instead of trying to remove the ill effects of IPI broadcast to CPUs in 
>> offline state.
>> 
>> Initial concern from Andi was IPI performance, but some primitive test with a 
>> good number of samples doesnt seem to indicate any degration at all, infact the
>> results seem identical. (Barring any operator errors :-( ).
>> 
>> It would be nice to hear other opinions as well, hopefuly we can close on
>> what what the right approach in this case. Link to an earlier discussion
>> on the topic.
>
> I don't think it's worth the extra boot time complexity to use the boot 
> workaround and i'm not convinced the extra mask against cpu_online_map 
> slows down that path enough to show up compared to waiting for remote 
> processor IPI handling to commence/complete.

What boot slowdown? 

I assume any practical CPU hotplug will have a way to detect it 
at boot - e.g. ACPI will probably need to tell you about spare
CPUs that could be started or there is a command line option.

My request was basically to set a flag when "CPU hotplug possible"
is detected and then only use the slow fast path method when
CPU hotplug is possible.

Actually that was only the second best solution, better would
be to just fix the relatively obscure race in the CPU hotplug bootup
path, but Ashok for some reason seems to be very adverse to that
option.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 0/5] x86_64 CPU hotplug patch series.
  2005-06-03 16:35   ` Andi Kleen
@ 2005-06-03 17:15     ` Ashok Raj
  0 siblings, 0 replies; 19+ messages in thread
From: Ashok Raj @ 2005-06-03 17:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Zwane Mwaikambo, Andrew Morton, linux-kernel, discuss,
	Rusty Russell, Srivattsa Vaddagiri, ashok.raj

On Fri, Jun 03, 2005 at 06:35:01PM +0200, Andi Kleen wrote:
> Zwane Mwaikambo <zwane@arm.linux.org.uk> writes:
> 
> >
> > I don't think it's worth the extra boot time complexity to use the boot 
> > workaround and i'm not convinced the extra mask against cpu_online_map 
> > slows down that path enough to show up compared to waiting for remote 
> > processor IPI handling to commence/complete.
> 
> What boot slowdown? 

Its really not slowdown, but un-necessary complexity, either in terms
of detecting if not to use broadcast, or even like you mention to continue
to perform broadcast, and handle the cleanup of queued ipi,s before 
turning the cpu online.

> 
> I assume any practical CPU hotplug will have a way to detect it 
> at boot - e.g. ACPI will probably need to tell you about spare
> CPUs that could be started or there is a command line option.

what about case when we need to use logical cpu up/down that is required
for suspend/resume? its really not a platform hotplug, just another feature.

Now you select this by default, and in reallity, all mobile/servers would
end up using this, and all the code to detect hotplug availibty would just
become bit-rotting.

ACPI is pretty static name space, so if you have a 4 socket system and only 
have 2 socket populated, all that you see is 2 present, and 2 not present.

The only indication if hotplug is supported would be presence of _EJD.
Again if this is a NUMA node or wrapped under a bigger module device
its probably required only in the top level object. 

There is nothing there that suggest platform supports hotplug, may be we do
that via mach-*, it seems un-necessary to do all this complexity without
knowing what problem we are solving by doing all this?
> 
> My request was basically to set a flag when "CPU hotplug possible"
> is detected and then only use the slow fast path method when
> CPU hotplug is possible.

Why do you call it slow for using mask version? The tsc stats test case
i sent out doesnt show any indication of being slow by any means. Broadcast
and mask came about equal in numbers.

There is just one extra write to local apic, but that doesnt seem to be 
adding anything that significant to be called slow in the grand scheme of 
events.

> 
> Actually that was only the second best solution, better would
> be to just fix the relatively obscure race in the CPU hotplug bootup
> path, but Ashok for some reason seems to be very adverse to that
> option.

The easier way to fix the problem is fix the source of the problem. So 
if broadcast is introducing an un-necessary race, why not just eliminate that
altogether, when there is really no loss in performance per-se. Instead
of wring whole bunch of code to hande the race doesnt seem appealing.

If we can fix it with less code and there is no *real* loss in performance, 
why not pick a less complicated approach.

The cycle count results doenst seem to indicate any slowness, but freqently
you mention mask varient as slow approach? Doing an extra reg write doesnt
matter, what matters is how long smp_call_function() takes, and that seems
pretty much the same in both instances.

-- 
Cheers,
Ashok Raj
- Open Source Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-06-03 17:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-02 12:57 [patch 0/5] x86_64 CPU hotplug patch series Ashok Raj
2005-06-02 12:57 ` [patch 1/5] x86_64: Change init sections for CPU hotplug support Ashok Raj
2005-06-02 20:14   ` Zwane Mwaikambo
2005-06-02 23:19     ` Ashok Raj
2005-06-02 12:57 ` [patch 2/5] x86_64: " Ashok Raj
2005-06-02 20:19   ` Zwane Mwaikambo
2005-06-02 23:33     ` Ashok Raj
2005-06-02 23:45       ` Zwane Mwaikambo
2005-06-03  0:08         ` Ashok Raj
2005-06-03  2:01       ` Shaohua Li
2005-06-03 14:25         ` Ashok Raj
2005-06-02 12:57 ` [patch 3/5] x86_64: CPU hotplug sibling map cleanup Ashok Raj
2005-06-02 12:57 ` [patch 4/5] x86_64: Dont use broadcast shortcut to make it cpu hotplug safe Ashok Raj
2005-06-02 12:58 ` [patch 5/5] x86_64: Provide ability to choose using shortcuts for IPI in flat mode Ashok Raj
2005-06-02 20:10   ` Zwane Mwaikambo
2005-06-02 23:15     ` Ashok Raj
2005-06-02 20:25 ` [patch 0/5] x86_64 CPU hotplug patch series Zwane Mwaikambo
2005-06-03 16:35   ` Andi Kleen
2005-06-03 17:15     ` Ashok Raj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).