All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] cpumask: fixups and additions
@ 2008-12-11 11:28 Mike Travis
  2008-12-11 11:28 ` [PATCH 1/4] x86: fix assign_irq_vector boot up problem Mike Travis
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Mike Travis @ 2008-12-11 11:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, linux-kernel


The following patches are included:

  * x86: fix assign_irq_vector boot up problem.
	Fix boot up problem on Intel SATA AHCI disks.

  * x86: fix cpu_mask_to_apicid_and to include cpu_online_mask.
	Fix potential problem with offline cpus.

  * cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict
    the limit.
	Allow adding additional cpus with maxcpus kernel start parameter.

  * cpumask: add sysfs displays for configured and disabled cpu maps
	Display both the configured kernel_max cpus (NR_CPUS-1) as well
	as the cpus in the system that are disabled because they exceed
	the NR_CPUS limit.

Based on:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git/cpus4096
+ git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-ingo.git


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] x86: fix assign_irq_vector boot up problem
  2008-12-11 11:28 [PATCH 0/4] cpumask: fixups and additions Mike Travis
@ 2008-12-11 11:28 ` Mike Travis
  2008-12-12  8:27   ` Rusty Russell
  2008-12-11 11:28 ` [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask Mike Travis
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Mike Travis @ 2008-12-11 11:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, linux-kernel

[-- Attachment #1: x86:fix-assign_irq_vector-boot-up-problem.patch --]
[-- Type: text/plain, Size: 1473 bytes --]

Impact: fix boot up problem.

Fix a problem encountered with the Intel SATA-AHCI disk driver
right at system startup.  Cpumask_intersects really needs to be
a 3-way intersect, and since we need a cpumask_var_t later on,
then just use it for the 3-way intersect as well.

Based on:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git/cpus4096
+ git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-ingo.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/io_apic.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/io_apic.c
+++ linux-2.6-for-ingo/arch/x86/kernel/io_apic.c
@@ -1084,15 +1084,19 @@ static int __assign_irq_vector(int irq, 
 	if ((cfg->move_in_progress) || cfg->move_cleanup_count)
 		return -EBUSY;
 
+	if (!alloc_cpumask_var(&tmp_mask, GFP_ATOMIC))
+		return -ENOMEM;
+
 	old_vector = cfg->vector;
 	if (old_vector) {
-		if (!cpumask_intersects(mask, cpu_online_mask))
-			return 0;
+		cpumask_and(tmp_mask, mask, cpu_online_mask);
+		cpumask_and(tmp_mask, cfg->domain, tmp_mask);
+		if (!cpumask_empty(tmp_mask)) {
+			free_cpumask_var(tmp_mask);
+ 			return 0;
+		}
 	}
 
-	if (!alloc_cpumask_var(&tmp_mask, GFP_ATOMIC))
-		return -ENOMEM;
-
 	/* Only try and allocate irqs on cpus that are present */
 	err = -ENOSPC;
 	for_each_cpu_and(cpu, mask, cpu_online_mask) {


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  2008-12-11 11:28 [PATCH 0/4] cpumask: fixups and additions Mike Travis
  2008-12-11 11:28 ` [PATCH 1/4] x86: fix assign_irq_vector boot up problem Mike Travis
@ 2008-12-11 11:28 ` Mike Travis
  2008-12-12 11:06   ` Rusty Russell
  2008-12-11 11:28 ` [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit Mike Travis
  2008-12-11 11:28 ` [PATCH 4/4] cpumask: add sysfs displays for configured and disabled cpu maps Mike Travis
  3 siblings, 1 reply; 18+ messages in thread
From: Mike Travis @ 2008-12-11 11:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, linux-kernel

[-- Attachment #1: fix-cpu_mask_to_apicid_and.patch --]
[-- Type: text/plain, Size: 8557 bytes --]

Impact: fix potential problem.

In determining the destination apicid, there are usually three cpumasks
that are considered: the incoming cpumask arg, cfg->domain and the
cpu_online_mask.  Since we are just introducing the cpu_mask_to_apicid_and
function, make sure it includes the cpu_online_mask in it's evaluation.

Based on:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git/cpus4096
+ git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-ingo.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
There are two io_apic.c functions that did not previously use the
cpu_online_mask:  setup_IO_APIC_irq and msi_compose_msg.  Both of these
simply used cpu_mask_to_apicid(cfg->domain && TARGET_CPUS) though I'm not
sure why you would want to set a destination apic id for an offline cpu?
Numaq is the only arch that can potentially return a TARGET_CPUS set that
includes offlined cpus.  (Is this an error?)
---
 arch/x86/include/asm/bigsmp/apic.h            |    4 +-
 arch/x86/include/asm/es7000/apic.h            |   40 +++++++++++---------------
 arch/x86/include/asm/mach-default/mach_apic.h |    3 +
 arch/x86/include/asm/summit/apic.h            |   30 +++++++++++--------
 arch/x86/kernel/genapic_flat_64.c             |    4 +-
 arch/x86/kernel/genx2apic_cluster.c           |    4 +-
 arch/x86/kernel/genx2apic_phys.c              |    4 +-
 arch/x86/kernel/genx2apic_uv_x.c              |    4 +-
 8 files changed, 52 insertions(+), 41 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/include/asm/bigsmp/apic.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/bigsmp/apic.h
@@ -138,7 +138,9 @@ static inline unsigned int cpu_mask_to_a
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_any_and(cpumask, andmask);
+	for_each_cpu_and(cpu, cpumask, andmask)
+		if (cpumask_test_cpu(cpu, cpu_online_mask))
+			break;
 	if (cpu < nr_cpu_ids)
 		return cpu_to_logical_apicid(cpu);
 
--- linux-2.6-for-ingo.orig/arch/x86/include/asm/es7000/apic.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/es7000/apic.h
@@ -214,51 +214,47 @@ static inline unsigned int cpu_mask_to_a
 	return apicid;
 }
 
-static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+
+static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *inmask,
 						  const struct cpumask *andmask)
 {
 	int num_bits_set;
-	int num_bits_set2;
 	int cpus_found = 0;
 	int cpu;
-	int apicid = 0;
+	int apicid = cpu_to_logical_apicid(0);
+	cpumask_var_t cpumask;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+		return apicid;
+
+	cpumask_and(cpumask, inmask, andmask);
+	cpumask_and(cpumask, cpumask, cpu_online_mask);
 
 	num_bits_set = cpumask_weight(cpumask);
-	num_bits_set2 = cpumask_weight(andmask);
-	num_bits_set = min(num_bits_set, num_bits_set2);
 	/* Return id to all */
-	if (num_bits_set >= nr_cpu_ids)
-#if defined CONFIG_ES7000_CLUSTERED_APIC
-		return 0xFF;
-#else
-		return cpu_to_logical_apicid(0);
-#endif
+	if (num_bits_set == NR_CPUS)
+		goto exit;
 	/*
 	 * The cpus in the mask must all be on the apic cluster.  If are not
 	 * on the same apicid cluster return default value of TARGET_CPUS.
 	 */
-	cpu = cpumask_first_and(cpumask, andmask);
+	cpu = cpumask_first(cpumask);
 	apicid = cpu_to_logical_apicid(cpu);
-
 	while (cpus_found < num_bits_set) {
-		if (cpumask_test_cpu(cpu, cpumask) &&
-		    cpumask_test_cpu(cpu, andmask)) {
+		if (cpumask_test_cpu(cpu, cpumask)) {
 			int new_apicid = cpu_to_logical_apicid(cpu);
 			if (apicid_cluster(apicid) !=
-					apicid_cluster(new_apicid)) {
-				printk(KERN_WARNING
-					"%s: Not a valid mask!\n", __func__);
-#if defined CONFIG_ES7000_CLUSTERED_APIC
-				return 0xFF;
-#else
+					apicid_cluster(new_apicid)){
+				printk ("%s: Not a valid mask!\n", __func__);
 				return cpu_to_logical_apicid(0);
-#endif
 			}
 			apicid = new_apicid;
 			cpus_found++;
 		}
 		cpu++;
 	}
+exit:
+	free_cpumask_var(cpumask);
 	return apicid;
 }
 
--- linux-2.6-for-ingo.orig/arch/x86/include/asm/mach-default/mach_apic.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/mach-default/mach_apic.h
@@ -72,8 +72,9 @@ static inline unsigned int cpu_mask_to_a
 {
 	unsigned long mask1 = cpumask_bits(cpumask)[0];
 	unsigned long mask2 = cpumask_bits(andmask)[0];
+	unsigned long mask3 = cpumask_bits(cpu_online_mask)[0];
 
-	return (unsigned int)(mask1 & mask2);
+	return (unsigned int)(mask1 & mask2 & mask3);
 }
 
 static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
--- linux-2.6-for-ingo.orig/arch/x86/include/asm/summit/apic.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/summit/apic.h
@@ -170,35 +170,37 @@ static inline unsigned int cpu_mask_to_a
 	return apicid;
 }
 
-static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *inmask,
 						  const struct cpumask *andmask)
 {
 	int num_bits_set;
-	int num_bits_set2;
 	int cpus_found = 0;
 	int cpu;
-	int apicid = 0;
+	int apicid = 0xFF;
+	cpumask_var_t cpumask;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+		return (int) 0xFF;
+
+	cpumask_and(cpumask, inmask, andmask);
+	cpumask_and(cpumask, cpumask, cpu_online_mask);
 
 	num_bits_set = cpumask_weight(cpumask);
-	num_bits_set2 = cpumask_weight(andmask);
-	num_bits_set = min(num_bits_set, num_bits_set2);
 	/* Return id to all */
-	if (num_bits_set >= nr_cpu_ids)
-		return 0xFF;
+	if (num_bits_set == nr_cpu_ids)
+		goto exit;
 	/*
 	 * The cpus in the mask must all be on the apic cluster.  If are not
 	 * on the same apicid cluster return default value of TARGET_CPUS.
 	 */
-	cpu = cpumask_first_and(cpumask, andmask);
+	cpu = cpumask_first(cpumask);
 	apicid = cpu_to_logical_apicid(cpu);
 	while (cpus_found < num_bits_set) {
-		if (cpumask_test_cpu(cpu, cpumask)
-		    && cpumask_test_cpu(cpu, andmask)) {
+		if (cpumask_test_cpu(cpu, cpumask)) {
 			int new_apicid = cpu_to_logical_apicid(cpu);
 			if (apicid_cluster(apicid) !=
-					apicid_cluster(new_apicid)) {
-				printk(KERN_WARNING
-					"%s: Not a valid mask!\n", __func__);
+					apicid_cluster(new_apicid)){
+				printk ("%s: Not a valid mask!\n", __func__);
 				return 0xFF;
 			}
 			apicid = apicid | new_apicid;
@@ -206,6 +208,8 @@ static inline unsigned int cpu_mask_to_a
 		}
 		cpu++;
 	}
+exit:
+	free_cpumask_var(cpumask);
 	return apicid;
 }
 
--- linux-2.6-for-ingo.orig/arch/x86/kernel/genapic_flat_64.c
+++ linux-2.6-for-ingo/arch/x86/kernel/genapic_flat_64.c
@@ -276,7 +276,9 @@ physflat_cpu_mask_to_apicid_and(const st
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_any_and(cpumask, andmask);
+	for_each_cpu_and(cpu, cpumask, andmask)
+		if (cpumask_test_cpu(cpu, cpu_online_mask))
+			break;
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
 	return BAD_APICID;
--- linux-2.6-for-ingo.orig/arch/x86/kernel/genx2apic_cluster.c
+++ linux-2.6-for-ingo/arch/x86/kernel/genx2apic_cluster.c
@@ -133,7 +133,9 @@ static unsigned int x2apic_cpu_mask_to_a
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_any_and(cpumask, andmask);
+	for_each_cpu_and(cpu, cpumask, andmask)
+		if (cpumask_test_cpu(cpu, cpu_online_mask))
+			break;
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
 	return BAD_APICID;
--- linux-2.6-for-ingo.orig/arch/x86/kernel/genx2apic_phys.c
+++ linux-2.6-for-ingo/arch/x86/kernel/genx2apic_phys.c
@@ -132,7 +132,9 @@ static unsigned int x2apic_cpu_mask_to_a
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_any_and(cpumask, andmask);
+	for_each_cpu_and(cpu, cpumask, andmask)
+		if (cpumask_test_cpu(cpu, cpu_online_mask))
+			break;
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
 	return BAD_APICID;
--- linux-2.6-for-ingo.orig/arch/x86/kernel/genx2apic_uv_x.c
+++ linux-2.6-for-ingo/arch/x86/kernel/genx2apic_uv_x.c
@@ -188,7 +188,9 @@ static unsigned int uv_cpu_mask_to_apici
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_any_and(cpumask, andmask);
+	for_each_cpu_and(cpu, cpumask, andmask)
+		if (cpumask_test_cpu(cpu, cpu_online_mask))
+			break;
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
 	return BAD_APICID;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit
  2008-12-11 11:28 [PATCH 0/4] cpumask: fixups and additions Mike Travis
  2008-12-11 11:28 ` [PATCH 1/4] x86: fix assign_irq_vector boot up problem Mike Travis
  2008-12-11 11:28 ` [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask Mike Travis
@ 2008-12-11 11:28 ` Mike Travis
  2008-12-11 13:41   ` Heiko Carstens
  2008-12-12 11:41   ` Rusty Russell
  2008-12-11 11:28 ` [PATCH 4/4] cpumask: add sysfs displays for configured and disabled cpu maps Mike Travis
  3 siblings, 2 replies; 18+ messages in thread
From: Mike Travis @ 2008-12-11 11:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, linux-kernel

[-- Attachment #1: x86:use-maxcpus.patch --]
[-- Type: text/plain, Size: 6594 bytes --]

Impact: allow adding additional cpus.

Use maxcpus=NUM kernel parameter to extend the number of possible cpus as well
as (currently) limit them.  Any cpus >= number of present cpus will disabled.

The ability to HOTPLUG ON cpus that are "possible" but not "present" is
dealt with in a later patch.

Based on:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git/cpus4096
+ git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-ingo.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 Documentation/kernel-parameters.txt |    6 +++++-
 arch/x86/kernel/alternative.c       |    2 +-
 arch/x86/kernel/apic.c              |   12 +++++++++---
 arch/x86/kernel/smpboot.c           |   18 ++++++++++++++----
 arch/x86/kernel/visws_quirks.c      |    4 ++--
 include/linux/smp.h                 |   19 ++++++++++++++++++-
 init/main.c                         |   14 ++++++--------
 7 files changed, 55 insertions(+), 20 deletions(-)

--- linux-2.6-for-ingo.orig/Documentation/kernel-parameters.txt
+++ linux-2.6-for-ingo/Documentation/kernel-parameters.txt
@@ -1201,7 +1201,11 @@ and is between 256 and 4096 characters. 
 			should make use of.  maxcpus=n : n >= 0 limits the
 			kernel to using 'n' processors.  n=0 is a special case,
 			it is equivalent to "nosmp", which also disables
-			the IO APIC.
+			the IO APIC.  On [X86] maxcpus can also be used to
+			extend the number of possible cpus to overcome ACPI
+			tables that do not indicate disabled cpus, as well as
+			allow for additional cpus to be HOT PLUGGED in.
+			Format: <0-NR_CPUS>
 
 	max_addr=nn[KMG]	[KNL,BOOT,ia64] All physical memory greater than
 			or equal to this physical address is ignored.
--- linux-2.6-for-ingo.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-for-ingo/arch/x86/kernel/alternative.c
@@ -444,7 +444,7 @@ void __init alternative_instructions(voi
 					    _text, _etext);
 
 		/* Only switch to UP mode if we don't immediately boot others */
-		if (num_present_cpus() == 1 || setup_max_cpus <= 1)
+		if (num_present_cpus() == 1 || maxcpus() <= 1)
 			alternatives_smp_switch(0);
 	}
 #endif
--- linux-2.6-for-ingo.orig/arch/x86/kernel/apic.c
+++ linux-2.6-for-ingo/arch/x86/kernel/apic.c
@@ -1845,9 +1845,15 @@ void __cpuinit generic_processor_info(in
 	}
 	apic_version[apicid] = version;
 
-	if (num_processors >= NR_CPUS) {
-		printk(KERN_WARNING "WARNING: NR_CPUS limit of %i reached."
-			"  Processor ignored.\n", NR_CPUS);
+	if (num_processors >= maxcpus()) {
+		int max = maxcpus();
+		int thiscpu = max + disabled_cpus;
+
+		printk(KERN_WARNING
+			"ACPI: NR_CPUS/maxcpus limit of %i reached."
+			"  Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
+
+		disabled_cpus++;
 		return;
 	}
 
--- linux-2.6-for-ingo.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6-for-ingo/arch/x86/kernel/smpboot.c
@@ -1266,7 +1266,7 @@ void __init native_smp_cpus_done(unsigne
  *
  * Three ways to find out the number of additional hotplug CPUs:
  * - If the BIOS specified disabled CPUs in ACPI/mptables use that.
- * - The user can overwrite it with additional_cpus=NUM
+ * - The user can overwrite it with maxcpus=NUM
  * - Otherwise don't reserve additional CPUs.
  * We do this because additional CPUs waste a lot of memory.
  * -AK
@@ -1279,9 +1279,19 @@ __init void prefill_possible_map(void)
 	if (!num_processors)
 		num_processors = 1;
 
-	possible = num_processors + disabled_cpus;
-	if (possible > NR_CPUS)
-		possible = NR_CPUS;
+	if (setup_max_cpus == -1)	/* not specified */
+		possible = num_processors + disabled_cpus;
+	else if (setup_max_cpus == 0)	/* UP mode forced */
+		possible = 1;
+	else				/* user specified */
+		possible = setup_max_cpus;
+
+	if (possible > CONFIG_NR_CPUS) {
+		printk(KERN_WARNING
+			"%d Processors exceeds NR_CPUS limit of %d\n",
+			possible, CONFIG_NR_CPUS);
+		possible = CONFIG_NR_CPUS;
+	}
 
 	printk(KERN_INFO "SMP: Allowing %d CPUs, %d hotplug CPUs\n",
 		possible, max_t(int, possible - num_processors, 0));
--- linux-2.6-for-ingo.orig/arch/x86/kernel/visws_quirks.c
+++ linux-2.6-for-ingo/arch/x86/kernel/visws_quirks.c
@@ -228,8 +228,8 @@ static int __init visws_find_smp_config(
 		ncpus = CO_CPU_MAX;
 	}
 
-	if (ncpus > setup_max_cpus)
-		ncpus = setup_max_cpus;
+	if (ncpus > maxcpus())
+		ncpus = maxcpus();
 
 #ifdef CONFIG_X86_LOCAL_APIC
 	smp_found_config = 1;
--- linux-2.6-for-ingo.orig/include/linux/smp.h
+++ linux-2.6-for-ingo/include/linux/smp.h
@@ -112,7 +112,18 @@ int on_each_cpu(void (*func) (void *info
  */
 void smp_prepare_boot_cpu(void);
 
-extern unsigned int setup_max_cpus;
+extern int setup_max_cpus;
+static inline int maxcpus(void)
+{
+	int maxcpus = setup_max_cpus;
+
+	if (maxcpus == -1 || maxcpus > CONFIG_NR_CPUS)
+		maxcpus = CONFIG_NR_CPUS;
+	else if (maxcpus == 0)
+		maxcpus = 1;
+
+	return maxcpus;
+}
 
 #else /* !SMP */
 
@@ -149,6 +160,12 @@ static inline void smp_send_reschedule(i
 static inline void init_call_single_data(void)
 {
 }
+
+static inline int maxcpus(void)
+{
+	return 1;
+}
+
 #endif /* !SMP */
 
 /*
--- linux-2.6-for-ingo.orig/init/main.c
+++ linux-2.6-for-ingo/init/main.c
@@ -132,7 +132,7 @@ static char *ramdisk_execute_command;
 
 #ifdef CONFIG_SMP
 /* Setup configured maximum number of CPUs to activate */
-unsigned int __initdata setup_max_cpus = NR_CPUS;
+int setup_max_cpus = -1;
 
 /*
  * Setup routine for controlling SMP activation
@@ -157,7 +157,7 @@ static int __init nosmp(char *str)
 
 early_param("nosmp", nosmp);
 
-static int __init maxcpus(char *str)
+static int __init set_maxcpus(char *str)
 {
 	get_option(&str, &setup_max_cpus);
 	if (setup_max_cpus == 0)
@@ -166,9 +166,7 @@ static int __init maxcpus(char *str)
 	return 0;
 }
 
-early_param("maxcpus", maxcpus);
-#else
-#define setup_max_cpus NR_CPUS
+early_param("maxcpus", set_maxcpus);
 #endif
 
 /*
@@ -425,7 +423,7 @@ static void __init smp_init(void)
 
 	/* FIXME: This should be done in userspace --RR */
 	for_each_present_cpu(cpu) {
-		if (num_online_cpus() >= setup_max_cpus)
+		if (num_online_cpus() >= maxcpus())
 			break;
 		if (!cpu_online(cpu))
 			cpu_up(cpu);
@@ -433,7 +431,7 @@ static void __init smp_init(void)
 
 	/* Any cleanup work */
 	printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
-	smp_cpus_done(setup_max_cpus);
+	smp_cpus_done(maxcpus());
 }
 
 #endif
@@ -857,7 +855,7 @@ static int __init kernel_init(void * unu
 
 	cad_pid = task_pid(current);
 
-	smp_prepare_cpus(setup_max_cpus);
+	smp_prepare_cpus(maxcpus());
 
 	do_pre_smp_initcalls();
 	start_boot_trace();


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/4] cpumask: add sysfs displays for configured and disabled cpu maps
  2008-12-11 11:28 [PATCH 0/4] cpumask: fixups and additions Mike Travis
                   ` (2 preceding siblings ...)
  2008-12-11 11:28 ` [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit Mike Travis
@ 2008-12-11 11:28 ` Mike Travis
  2008-12-12 11:44   ` Rusty Russell
  3 siblings, 1 reply; 18+ messages in thread
From: Mike Travis @ 2008-12-11 11:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, linux-kernel

[-- Attachment #1: cpumask:add-sysfs-files.patch --]
[-- Type: text/plain, Size: 4127 bytes --]

Impact: add new functionality.

Add sysfs files "kernel_max" and "offline" to display the max CPU index
allowed (NR_CPUS-1), and the map of cpus that are offline. 

Cpus can be offlined via HOTPLUG, disabled by the BIOS ACPI tables, or
if they exceed the number of cpus allowed by the NR_CPUS config option,
or the "maxcpus=NUM" kernel start parameter.  The "maxcpus=NUM" parameter
can also extend the number of possible cpus allowed, in which case the
cpus not present at startup will be in the offline state.  (These cpus
can be HOTPLUGGED ON after system startup [pending a follow-on patch
to provide the capability via the /sys/devices/sys/cpu/cpuN/online
mechanism to bring them online.])

By design, the "offlined cpus > possible cpus" display will always
use the following formats:

  * all possible cpus online:   "x$"    or "x-y$" 
  * some possible cpus offline: ".*,x$" or ".*,x-y$"

where:
  x == number of possible cpus (nr_cpu_ids); and
  y == number of cpus >= NR_CPUS or maxcpus (if y > x).

One use of this feature is for distros to select (or configure) the
appropriate kernel to install for the resident system.

Notes:
  * cpus offlined <= possible cpus will be printed for all architectures.
  * cpus offlined >  possible cpus will only be printed for arches that
  	set 'total_cpus' [X86 only in this patch].

Based on:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git/cpus4096
+ git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-ingo.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/smpboot.c |    2 ++
 drivers/base/cpu.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/smp.h       |    3 +++
 3 files changed, 50 insertions(+)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6-for-ingo/arch/x86/kernel/smpboot.c
@@ -1286,6 +1286,8 @@ __init void prefill_possible_map(void)
 	else				/* user specified */
 		possible = setup_max_cpus;
 
+	total_cpus = max_t(int, possible, num_processors + disabled_cpus);
+
 	if (possible > CONFIG_NR_CPUS) {
 		printk(KERN_WARNING
 			"%d Processors exceeds NR_CPUS limit of %d\n",
--- linux-2.6-for-ingo.orig/drivers/base/cpu.c
+++ linux-2.6-for-ingo/drivers/base/cpu.c
@@ -128,10 +128,55 @@ print_cpus_func(online);
 print_cpus_func(possible);
 print_cpus_func(present);
 
+/*
+ * Print values for NR_CPUS and offlined cpus
+ */
+static ssize_t print_cpus_kernel_max(struct sysdev_class *class, char *buf)
+{
+	int n = snprintf(buf, PAGE_SIZE-2, "%d\n", CONFIG_NR_CPUS - 1);
+	return n;
+}
+static SYSDEV_CLASS_ATTR(kernel_max, 0444, print_cpus_kernel_max, NULL);
+
+/* arch-optional setting to enable display of offline cpus >= nr_cpu_ids */
+unsigned int total_cpus;
+
+static ssize_t print_cpus_offline(struct sysdev_class *class, char *buf)
+{
+	int n = 0, len = PAGE_SIZE-2;
+	cpumask_var_t offline;
+
+	/* display offline cpus < nr_cpu_ids */
+	if (!alloc_cpumask_var(&offline, GFP_KERNEL))
+		goto next;
+	cpumask_complement(offline, cpu_online_mask);
+	n = cpulist_scnprintf(buf, len, offline);
+	free_cpumask_var(offline);
+
+next:
+	/* display offline cpus >= nr_cpu_ids */
+	if (total_cpus && nr_cpu_ids < total_cpus) {
+		if (n && n < len)
+			buf[n++] = ',';
+
+		if (nr_cpu_ids == total_cpus-1)
+			n += snprintf(&buf[n], len - n, "%d", nr_cpu_ids);
+		else
+			n += snprintf(&buf[n], len - n, "%d-%d",
+						      nr_cpu_ids, total_cpus-1);
+	}
+
+	n += snprintf(&buf[n], len - n, "\n");
+	return n;
+}
+static SYSDEV_CLASS_ATTR(offline, 0444, print_cpus_offline, NULL);
+
 static struct sysdev_class_attribute *cpu_state_attr[] = {
 	&attr_online_map,
 	&attr_possible_map,
 	&attr_present_map,
+	&attr_kernel_max,
+	&attr_offline,
 };
 
 static int cpu_states_init(void)
--- linux-2.6-for-ingo.orig/include/linux/smp.h
+++ linux-2.6-for-ingo/include/linux/smp.h
@@ -21,6 +21,9 @@ struct call_single_data {
 	u16 priv;
 };
 
+/* total number of cpus in this system (may exceed NR_CPUS) */
+extern unsigned int total_cpus;
+
 #ifdef CONFIG_SMP
 
 #include <linux/preempt.h>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit
  2008-12-11 11:28 ` [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit Mike Travis
@ 2008-12-11 13:41   ` Heiko Carstens
  2008-12-11 18:19     ` Mike Travis
  2008-12-12 11:41   ` Rusty Russell
  1 sibling, 1 reply; 18+ messages in thread
From: Heiko Carstens @ 2008-12-11 13:41 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	linux-kernel

On Thu, Dec 11, 2008 at 03:28:09AM -0800, Mike Travis wrote:
> Impact: allow adding additional cpus.
> 
> Use maxcpus=NUM kernel parameter to extend the number of possible cpus as well
> as (currently) limit them.  Any cpus >= number of present cpus will disabled.
> 
> The ability to HOTPLUG ON cpus that are "possible" but not "present" is
> dealt with in a later patch.

Hm.. documentation/kernel-parameters.txt says:

	maxcpus=	[SMP] Maximum number of processors that an SMP kernel
			should make use of.  maxcpus=n : n >= 0 limits the
			kernel to using 'n' processors.	 n=0 is a special case,
			it is equivalent to "nosmp", which also disables
			the IO APIC.

but documentation/cpu-hotplug.txt says:

maxcpus=n    Restrict boot time cpus to n. Say if you have 4 cpus, using
	     maxcpus=2 will only boot 2. You can choose to bring the
	     other cpus later online, read FAQ's for more info.

It used to be (implementation wise) that maxcpus doesn't influence the number
of possible cpus but just indicated how many cpus were brought online at startup
of the kernel. Which is what cpu-hotplug.txt describes.

Other present cpus would appear offline and could be brought online later.

For s390 I added the possible_cpus kernel parameter back then, since my
understanding back then was that maxcpus doesn't and shouldn't influence the
number of possible cpus:

possible_cpus=n		[s390 only] use this to set hotpluggable cpus.
			This option sets possible_cpus bits in
			cpu_possible_map. Thus keeping the numbers of bits set
			constant even if the machine gets rebooted.

Dunno... it all looks like a mess ;)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit
  2008-12-11 13:41   ` Heiko Carstens
@ 2008-12-11 18:19     ` Mike Travis
  2008-12-12 10:03       ` Heiko Carstens
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Travis @ 2008-12-11 18:19 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	linux-kernel

Heiko Carstens wrote:
> On Thu, Dec 11, 2008 at 03:28:09AM -0800, Mike Travis wrote:
>> Impact: allow adding additional cpus.
>>
>> Use maxcpus=NUM kernel parameter to extend the number of possible cpus as well
>> as (currently) limit them.  Any cpus >= number of present cpus will disabled.
>>
>> The ability to HOTPLUG ON cpus that are "possible" but not "present" is
>> dealt with in a later patch.
> 
> Hm.. documentation/kernel-parameters.txt says:
> 
> 	maxcpus=	[SMP] Maximum number of processors that an SMP kernel
> 			should make use of.  maxcpus=n : n >= 0 limits the
> 			kernel to using 'n' processors.	 n=0 is a special case,
> 			it is equivalent to "nosmp", which also disables
> 			the IO APIC.
> 
> but documentation/cpu-hotplug.txt says:
> 
> maxcpus=n    Restrict boot time cpus to n. Say if you have 4 cpus, using
> 	     maxcpus=2 will only boot 2. You can choose to bring the
> 	     other cpus later online, read FAQ's for more info.
> 
> It used to be (implementation wise) that maxcpus doesn't influence the number
> of possible cpus but just indicated how many cpus were brought online at startup
> of the kernel. Which is what cpu-hotplug.txt describes.
> 
> Other present cpus would appear offline and could be brought online later.
> 
> For s390 I added the possible_cpus kernel parameter back then, since my
> understanding back then was that maxcpus doesn't and shouldn't influence the
> number of possible cpus:
> 
> possible_cpus=n		[s390 only] use this to set hotpluggable cpus.
> 			This option sets possible_cpus bits in
> 			cpu_possible_map. Thus keeping the numbers of bits set
> 			constant even if the machine gets rebooted.
> 
> Dunno... it all looks like a mess ;)

Hmm, I hadn't noticed that.  For a while the X86 devel kernel had an
"additional_cpus=n" parameter, which was also a bit confusing.  Say you
wanted, 64 total, you had to give the increment over how many you already
had [e.g., (want)64 - (have)16  = (additional_cpus=)48.]

I just figured that re-using the same kernel parameter was better than adding
another.  But I'm willing to go either way.

Btw, I did alter the Documentation/kernel-parameters.txt file:

        maxcpus=        [SMP] Maximum number of processors that an SMP kernel
                        should make use of.  maxcpus=n : n >= 0 limits the
                        kernel to using 'n' processors.  n=0 is a special case,
                        it is equivalent to "nosmp", which also disables
                        the IO APIC.  On [X86] maxcpus can also be used to
                        extend the number of possible cpus to overcome ACPI
                        tables that do not indicate disabled cpus, as well as
                        allow for additional cpus to be HOT PLUGGED in.
                        Format: <0-NR_CPUS>

In modifying the code, I could not find any arch-specific instances and the only
general case was in the 3 references in init/main.c, so I figured it was pretty
safe to use.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] x86: fix assign_irq_vector boot up problem
  2008-12-11 11:28 ` [PATCH 1/4] x86: fix assign_irq_vector boot up problem Mike Travis
@ 2008-12-12  8:27   ` Rusty Russell
  2008-12-12  9:20     ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Rusty Russell @ 2008-12-12  8:27 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Thursday 11 December 2008 21:58:07 Mike Travis wrote:
> Impact: fix boot up problem.
> 
> Fix a problem encountered with the Intel SATA-AHCI disk driver
> right at system startup.  Cpumask_intersects really needs to be
> a 3-way intersect, and since we need a cpumask_var_t later on,
> then just use it for the 3-way intersect as well.

This one looks fine.

My plan was for Ingo to pull that for-ingo tree into his cpus4096 tree
and take the x86 patches from there.  But he hasn't so maybe I should
take this chance to fold that patch in?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] x86: fix assign_irq_vector boot up problem
  2008-12-12  8:27   ` Rusty Russell
@ 2008-12-12  9:20     ` Ingo Molnar
  2008-12-12 18:10       ` Mike Travis
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-12-12  9:20 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Mike Travis, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Thursday 11 December 2008 21:58:07 Mike Travis wrote:
> > Impact: fix boot up problem.
> > 
> > Fix a problem encountered with the Intel SATA-AHCI disk driver
> > right at system startup.  Cpumask_intersects really needs to be
> > a 3-way intersect, and since we need a cpumask_var_t later on,
> > then just use it for the 3-way intersect as well.
> 
> This one looks fine.
> 
> My plan was for Ingo to pull that for-ingo tree into his cpus4096 tree 
> and take the x86 patches from there.  But he hasn't so maybe I should 
> take this chance to fold that patch in?

i have no objections against the bits - just the sparseirq complication 
came in. A lot of effort went into irq/sparseirq's io_apic.c changes and 
cleanup.

So to do this cleanly, i merged those bits into cpus4096 and the 
x86/reboot bits as well - now the plan would be for Mike to send a 
(rebased) series against that base. I tried a plain merge and the 
conflicts in io_apic.c were a horrendous 76 rejects due to the 
irq/sparseirq interaction. Also, some of the commits subjects looked a 
bit raw so this bit of the tree needs to be redone.

(Note that the existing cpumask-base+scheduler bits in cpus4096 are 
golden already and we dont have to touch them in any way, it's just the 
new x86 bits and new cpumask infrastructure bits that look odd or 
clashy.)

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit
  2008-12-11 18:19     ` Mike Travis
@ 2008-12-12 10:03       ` Heiko Carstens
  0 siblings, 0 replies; 18+ messages in thread
From: Heiko Carstens @ 2008-12-12 10:03 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	linux-kernel

On Thu, Dec 11, 2008 at 10:19:46AM -0800, Mike Travis wrote:
> Heiko Carstens wrote:
> > maxcpus=n    Restrict boot time cpus to n. Say if you have 4 cpus, using
> > 	     maxcpus=2 will only boot 2. You can choose to bring the
> > 	     other cpus later online, read FAQ's for more info.
> > 
> > It used to be (implementation wise) that maxcpus doesn't influence the number
> > of possible cpus but just indicated how many cpus were brought online at startup
> > of the kernel. Which is what cpu-hotplug.txt describes.
> > 
> > Other present cpus would appear offline and could be brought online later.
> > 
> > For s390 I added the possible_cpus kernel parameter back then, since my
> > understanding back then was that maxcpus doesn't and shouldn't influence the
> > number of possible cpus:
> > 
> > possible_cpus=n		[s390 only] use this to set hotpluggable cpus.
> > 			This option sets possible_cpus bits in
> > 			cpu_possible_map. Thus keeping the numbers of bits set
> > 			constant even if the machine gets rebooted.
> 
> Hmm, I hadn't noticed that.  For a while the X86 devel kernel had an
> "additional_cpus=n" parameter, which was also a bit confusing.  Say you
> wanted, 64 total, you had to give the increment over how many you already
> had [e.g., (want)64 - (have)16  = (additional_cpus=)48.]

Yes, we had the additional_cpus parameter as well. But that was too confusing.
Especially if you add a few cpus while the system is running and then reboot
it. The result for (want) would vary for each configuration change and reboot.

That's why I added possible_cpus to s390, then you get (want) == possible_cpus.

> I just figured that re-using the same kernel parameter was better than adding
> another.  But I'm willing to go either way.

Maybe you could go for possible_cpus as well? Having this in sync for several
architectures seems not so bad :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  2008-12-11 11:28 ` [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask Mike Travis
@ 2008-12-12 11:06   ` Rusty Russell
  2008-12-12 16:37     ` Mike Travis
  0 siblings, 1 reply; 18+ messages in thread
From: Rusty Russell @ 2008-12-12 11:06 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 16339 bytes --]

On Thursday 11 December 2008 21:58:08 Mike Travis wrote:> Impact: fix potential problem.> > In determining the destination apicid, there are usually three cpumasks> that are considered: the incoming cpumask arg, cfg->domain and the> cpu_online_mask.  Since we are just introducing the cpu_mask_to_apicid_and> function, make sure it includes the cpu_online_mask in it's evaluation.
Yerk.  Can we really "fail" cpu_mask_to_apicid_and with no repercussions?And does it make sense to try to fix this there?
This is not a new problem with the cpumask patches is it?  I toyed with apatch which converts flush_tlb_others, and it actually ensures that thosecases never hand an offline mask to cpu_mask_to_apicid_and as a sideeffect (others still might).
Patch below for your reading (x86:flush_tlb_others-cpumask-ptr.patch inmy series file).
Rusty.
x86: change flush_tlb_others to take a const struct cpumask *. FIXME: REVIEW
This is made a little more tricky by uv_flush_tlb_others whichactually alters its argument, for an IPI to be sent to the remainingcpus in the mask.
I solve this by allocating a cpumask_var_t for this case and falling backto IPI should this fail.
To eliminate temporaries in the caller, all flush_tlb_others implementationsnow do the this-cpu-elimination step themselves.
Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"which has been there since pre-git and yet f->flush_cpumask is always zeroat this point.
diff --git a/arch/x86/include/asm/mmu_context_32.h b/arch/x86/include/asm/mmu_codiff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h--- a/arch/x86/include/asm/paravirt.h+++ b/arch/x86/include/asm/paravirt.h@@ -244,7 +244,8 @@ struct pv_mmu_ops { 	void (*flush_tlb_user)(void); 	void (*flush_tlb_kernel)(void); 	void (*flush_tlb_single)(unsigned long addr);-	void (*flush_tlb_others)(const cpumask_t *cpus, struct mm_struct *mm,+	void (*flush_tlb_others)(const struct cpumask *cpus,+				 struct mm_struct *mm, 				 unsigned long va);  	/* Hooks for allocating and freeing a pagetable top-level */@@ -984,10 +985,11 @@ static inline void __flush_tlb_single(un 	PVOP_VCALL1(pv_mmu_ops.flush_tlb_single, addr); } -static inline void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,+static inline void flush_tlb_others(const struct cpumask *cpumask,+				    struct mm_struct *mm, 				    unsigned long va) {-	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, &cpumask, mm, va);+	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, cpumask, mm, va); }  static inline int paravirt_pgd_alloc(struct mm_struct *mm)diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h--- a/arch/x86/include/asm/tlbflush.h+++ b/arch/x86/include/asm/tlbflush.h@@ -113,7 +113,7 @@ static inline void flush_tlb_range(struc 		__flush_tlb(); } -static inline void native_flush_tlb_others(const cpumask_t *cpumask,+static inline void native_flush_tlb_others(const struct cpumask *cpumask, 					   struct mm_struct *mm, 					   unsigned long va) {@@ -142,8 +142,8 @@ static inline void flush_tlb_range(struc 	flush_tlb_mm(vma->vm_mm); } -void native_flush_tlb_others(const cpumask_t *cpumask, struct mm_struct *mm,-			     unsigned long va);+void native_flush_tlb_others(const struct cpumask *cpumask,+			     struct mm_struct *mm, unsigned long va);  #define TLBSTATE_OK	1 #define TLBSTATE_LAZY	2diff --git a/arch/x86/include/asm/uv/uv_bau.h b/arch/x86/include/asm/uv/uv_bau.h--- a/arch/x86/include/asm/uv/uv_bau.h+++ b/arch/x86/include/asm/uv/uv_bau.h@@ -325,7 +325,8 @@ static inline void bau_cpubits_clear(str #define cpubit_isset(cpu, bau_local_cpumask) \ 	test_bit((cpu), (bau_local_cpumask).bits) -extern int uv_flush_tlb_others(cpumask_t *, struct mm_struct *, unsigned long);+extern int uv_flush_tlb_others(const struct cpumask *,+			       struct mm_struct *, unsigned long); extern void uv_bau_message_intr1(void); extern void uv_bau_timeout_intr1(void); diff --git a/arch/x86/kernel/tlb_32.c b/arch/x86/kernel/tlb_32.c--- a/arch/x86/kernel/tlb_32.c+++ b/arch/x86/kernel/tlb_32.c@@ -20,7 +20,7 @@ DEFINE_PER_CPU(struct tlb_state, cpu_tlb  *	Optimizations Manfred Spraul <manfred@colorfullife.com>  */ -static cpumask_t flush_cpumask;+static cpumask_var_t flush_cpumask; static struct mm_struct *flush_mm; static unsigned long flush_va; static DEFINE_SPINLOCK(tlbstate_lock);@@ -93,7 +94,7 @@ void smp_invalidate_interrupt(struct pt_  	cpu = get_cpu(); -	if (!cpu_isset(cpu, flush_cpumask))+	if (!cpumask_test_cpu(cpu, flush_cpumask)) 		goto out; 		/* 		 * This was a BUG() but until someone can quote me the@@ -115,34 +116,21 @@ void smp_invalidate_interrupt(struct pt_ 	} 	ack_APIC_irq(); 	smp_mb__before_clear_bit();-	cpu_clear(cpu, flush_cpumask);+	cpumask_clear_cpu(cpu, flush_cpumask); 	smp_mb__after_clear_bit(); out: 	put_cpu_no_resched(); 	__get_cpu_var(irq_stat).irq_tlb_count++; } -void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,-			     unsigned long va)+void native_flush_tlb_others(const struct cpumask *cpumask,+			     struct mm_struct *mm, unsigned long va) {-	cpumask_t cpumask = *cpumaskp;- 	/*-	 * A couple of (to be removed) sanity checks:-	 *-	 * - current CPU must not be in mask 	 * - mask must exist :) 	 */-	BUG_ON(cpus_empty(cpumask));-	BUG_ON(cpu_isset(smp_processor_id(), cpumask));+	BUG_ON(cpumask_empty(cpumask)); 	BUG_ON(!mm);--#ifdef CONFIG_HOTPLUG_CPU-	/* If a CPU which we ran on has gone down, OK. */-	cpus_and(cpumask, cpumask, cpu_online_map);-	if (unlikely(cpus_empty(cpumask)))-		return;-#endif  	/* 	 * i'm not happy about this global shared spinlock in the@@ -151,9 +139,17 @@ void native_flush_tlb_others(const cpuma 	 */ 	spin_lock(&tlbstate_lock); +	cpumask_andnot(flush_cpumask, cpumask, cpumask_of(smp_processor_id()));+#ifdef CONFIG_HOTPLUG_CPU+	/* If a CPU which we ran on has gone down, OK. */+	cpumask_and(flush_cpumask, flush_cpumask, cpu_online_mask);+	if (unlikely(cpumask_empty(flush_cpumask))) {+		spin_unlock(&tlbstate_lock);+		return;+	}+#endif 	flush_mm = mm; 	flush_va = va;-	cpus_or(flush_cpumask, cpumask, flush_cpumask);  	/* 	 * Make the above memory operations globally visible before@@ -164,9 +160,9 @@ void native_flush_tlb_others(const cpuma 	 * We have to send the IPI only to 	 * CPUs affected. 	 */-	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR);+	send_IPI_mask(flush_cpumask, INVALIDATE_TLB_VECTOR); -	while (!cpus_empty(flush_cpumask))+	while (!cpumask_empty(flush_cpumask)) 		/* nothing. lockup detection does not belong here */ 		cpu_relax(); @@ -178,25 +174,17 @@ void flush_tlb_current_task(void) void flush_tlb_current_task(void) { 	struct mm_struct *mm = current->mm;-	cpumask_t cpu_mask;  	preempt_disable();-	cpu_mask = mm->cpu_vm_mask;-	cpu_clear(smp_processor_id(), cpu_mask);- 	local_flush_tlb();-	if (!cpus_empty(cpu_mask))-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL); 	preempt_enable(); }  void flush_tlb_mm(struct mm_struct *mm) {-	cpumask_t cpu_mask;- 	preempt_disable();-	cpu_mask = *mm->cpu_vm_mask;-	cpu_clear(smp_processor_id(), cpu_mask);  	if (current->active_mm == mm) { 		if (current->mm)@@ -204,8 +192,8 @@ void flush_tlb_mm(struct mm_struct *mm) 		else 			leave_mm(smp_processor_id()); 	}-	if (!cpus_empty(cpu_mask))-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);  	preempt_enable(); }@@ -213,12 +201,8 @@ void flush_tlb_page(struct vm_area_struc void flush_tlb_page(struct vm_area_struct *vma, unsigned long va) { 	struct mm_struct *mm = vma->vm_mm;-	cpumask_t cpu_mask;  	preempt_disable();-	cpu_mask = *mm->cpu_vm_mask;-	cpu_clear(smp_processor_id(), cpu_mask);- 	if (current->active_mm == mm) { 		if (current->mm) 			__flush_tlb_one(va);@@ -226,9 +210,8 @@ void flush_tlb_page(struct vm_area_struc 			leave_mm(smp_processor_id()); 	} -	if (!cpus_empty(cpu_mask))-		flush_tlb_others(cpu_mask, mm, va);-+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)+		flush_tlb_others(mm->cpu_vm_mask, mm, va); 	preempt_enable(); } EXPORT_SYMBOL(flush_tlb_page);@@ -255,3 +238,9 @@ void reset_lazy_tlbstate(void) 	per_cpu(cpu_tlbstate, cpu).active_mm = &init_mm; } +static int init_flush_cpumask(void)+{+	alloc_cpumask_var(&flush_cpumask, GFP_KERNEL);+	return 0;+}+early_initcall(init_flush_cpumask);diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c--- a/arch/x86/kernel/tlb_64.c+++ b/arch/x86/kernel/tlb_64.c@@ -43,10 +43,10 @@  union smp_flush_state { 	struct {-		cpumask_t flush_cpumask; 		struct mm_struct *flush_mm; 		unsigned long flush_va; 		spinlock_t tlbstate_lock;+		DECLARE_BITMAP(flush_cpumask, NR_CPUS); 	}; 	char pad[SMP_CACHE_BYTES]; } ____cacheline_aligned;@@ -131,7 +131,7 @@ asmlinkage void smp_invalidate_interrupt 	sender = ~regs->orig_ax - INVALIDATE_TLB_VECTOR_START; 	f = &per_cpu(flush_state, sender); -	if (!cpu_isset(cpu, f->flush_cpumask))+	if (!cpumask_test_cpu(cpu, to_cpumask(f->flush_cpumask))) 		goto out; 		/* 		 * This was a BUG() but until someone can quote me the@@ -153,19 +153,15 @@ asmlinkage void smp_invalidate_interrupt 	} out: 	ack_APIC_irq();-	cpu_clear(cpu, f->flush_cpumask);+	cpumask_clear(cpu, to_cpumask(f->flush_cpumask)); 	add_pda(irq_tlb_count, 1); } -void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,-			     unsigned long va)+static void flush_tlb_others_ipi(const struct cpumask *cpumask,+				 struct mm_struct *mm, unsigned long va) { 	int sender; 	union smp_flush_state *f;-	cpumask_t cpumask = *cpumaskp;--	if (is_uv_system() && uv_flush_tlb_others(&cpumask, mm, va))-		return;  	/* Caller has disabled preemption */ 	sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;@@ -180,7 +176,8 @@ void native_flush_tlb_others(const cpuma  	f->flush_mm = mm; 	f->flush_va = va;-	cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask);+	cpumask_andnot(to_cpumask(f->flush_cpumask),+		       cpumask, cpumask_of(smp_processor_id));  	/* 	 * Make the above memory operations globally visible before@@ -191,14 +188,32 @@ void native_flush_tlb_others(const cpuma 	 * We have to send the IPI only to 	 * CPUs affected. 	 */-	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR_START + sender);+	send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR_START + sender); -	while (!cpus_empty(f->flush_cpumask))+	while (!cpumask_empty(to_cpumask(f->flush_cpumask))) 		cpu_relax();  	f->flush_mm = NULL; 	f->flush_va = 0; 	spin_unlock(&f->tlbstate_lock);+++void native_flush_tlb_others(const struct cpumask *cpumask,+			     struct mm_struct *mm, unsigned long va)+{+	if (is_uv_system()) {+		cpumask_var_t after_uv_flush;++		if (alloc_cpumask_var(&after_uv_flush, GFP_ATOMIC)) {+			cpumask_andnot(after_uv_flush,+				       cpumask, cpumask_of(smp_processor_id()));+			if (!uv_flush_tlb_others(after_uv_flush, mm, va))+				flush_tlb_others_ipi(after_uv_flush, mm, va);+			free_cpumask_var(after_uv_flush);+			return;+		}+	}+	flush_tlb_others_ipi(cpumask, mm, va); }  static int __cpuinit init_smp_flush(void)@@ -215,34 +230,26 @@ void flush_tlb_current_task(void) void flush_tlb_current_task(void) { 	struct mm_struct *mm = current->mm;-	cpumask_t cpu_mask;  	preempt_disable();-	cpu_mask = *mm->cpu_vm_mask;-	cpu_clear(smp_processor_id(), cpu_mask);- 	local_flush_tlb();-	if (!cpus_empty(cpu_mask))-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL); 	preempt_enable(); }  void flush_tlb_mm(struct mm_struct *mm) {-	cpumask_t cpu_mask;  	preempt_disable();-	cpu_mask = *mm->cpu_vm_mask;-	cpu_clear(smp_processor_id(), cpu_mask);- 	if (current->active_mm == mm) { 		if (current->mm) 			local_flush_tlb(); 		else 			leave_mm(smp_processor_id()); 	}-	if (!cpus_empty(cpu_mask))-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);  	preempt_enable(); }@@ -250,11 +257,8 @@ void flush_tlb_page(struct vm_area_struc void flush_tlb_page(struct vm_area_struct *vma, unsigned long va) { 	struct mm_struct *mm = vma->vm_mm;-	cpumask_t cpu_mask;  	preempt_disable();-	cpu_mask = *mm->cpu_vm_mask;-	cpu_clear(smp_processor_id(), cpu_mask);  	if (current->active_mm == mm) { 		if (current->mm)@@ -263,8 +267,8 @@ void flush_tlb_page(struct vm_area_struc 			leave_mm(smp_processor_id()); 	} -	if (!cpus_empty(cpu_mask))-		flush_tlb_others(cpu_mask, mm, va);+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)+		flush_tlb_others(mm->cpu_vm_mask, mm, va);  	preempt_enable(); }diff --git a/arch/x86/kernel/tlb_uv.c b/arch/x86/kernel/tlb_uv.c--- a/arch/x86/kernel/tlb_uv.c+++ b/arch/x86/kernel/tlb_uv.c@@ -212,11 +212,11 @@ static int uv_wait_completion(struct bau  * The cpumaskp mask contains the cpus the broadcast was sent to.  *  * Returns 1 if all remote flushing was done. The mask is zeroed.- * Returns 0 if some remote flushing remains to be done. The mask is left- * unchanged.+ * Returns 0 if some remote flushing remains to be done. The mask will have+ * some bits still set.  */ int uv_flush_send_and_wait(int cpu, int this_blade, struct bau_desc *bau_desc,-			   cpumask_t *cpumaskp)+			   struct cpumask *cpumaskp) { 	int completion_status = 0; 	int right_shift;@@ -263,13 +263,13 @@ int uv_flush_send_and_wait(int cpu, int  	 * Success, so clear the remote cpu's from the mask so we don't 	 * use the IPI method of shootdown on them. 	 */-	for_each_cpu_mask(bit, *cpumaskp) {+	for_each_cpu(bit, cpumaskp) { 		blade = uv_cpu_to_blade_id(bit); 		if (blade == this_blade) 			continue;-		cpu_clear(bit, *cpumaskp);+		cpumask_clear_cpu(bit, cpumaskp); 	}-	if (!cpus_empty(*cpumaskp))+	if (!cpumask_empty(cpumaskp)) 		return 0; 	return 1; }@@ -296,7 +296,7 @@ int uv_flush_send_and_wait(int cpu, int   * Returns 1 if all remote flushing was done.  * Returns 0 if some remote flushing remains to be done.  */-int uv_flush_tlb_others(cpumask_t *cpumaskp, struct mm_struct *mm,+int uv_flush_tlb_others(struct cpumask *cpumaskp, struct mm_struct *mm, 			unsigned long va) { 	int i;@@ -315,7 +315,7 @@ int uv_flush_tlb_others(cpumask_t *cpuma 	bau_nodes_clear(&bau_desc->distribution, UV_DISTRIBUTION_SIZE);  	i = 0;-	for_each_cpu_mask(bit, *cpumaskp) {+	for_each_cpu(bit, cpumaskp) { 		blade = uv_cpu_to_blade_id(bit); 		BUG_ON(blade > (UV_DISTRIBUTION_SIZE - 1)); 		if (blade == this_blade) {diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c--- a/arch/x86/xen/enlighten.c+++ b/arch/x86/xen/enlighten.c@@ -633,35 +633,27 @@ static void xen_flush_tlb_single(unsigne 	preempt_enable(); } -static void xen_flush_tlb_others(const cpumask_t *cpus, struct mm_struct *mm,-				 unsigned long va)+static void xen_flush_tlb_others(const struct cpumask *cpus,+				 struct mm_struct *mm, unsigned long va) { 	struct { 		struct mmuext_op op;-		cpumask_t mask;+		DECLARE_BITMAP(mask, NR_CPUS); 	} *args;-	cpumask_t cpumask = *cpus; 	struct multicall_space mcs; -	/*-	 * A couple of (to be removed) sanity checks:-	 *-	 * - current CPU must not be in mask-	 * - mask must exist :)-	 */-	BUG_ON(cpus_empty(cpumask));-	BUG_ON(cpu_isset(smp_processor_id(), cpumask));+	BUG_ON(cpumask_empty(cpus)); 	BUG_ON(!mm);--	/* If a CPU which we ran on has gone down, OK. */-	cpus_and(cpumask, cpumask, cpu_online_map);-	if (cpus_empty(cpumask))-		return;  	mcs = xen_mc_entry(sizeof(*args)); 	args = mcs.args;-	args->mask = cpumask;-	args->op.arg2.vcpumask = &args->mask;+	args->op.arg2.vcpumask = to_cpumask(args->mask);++	/* Remove us, and any offline CPUS. */+	cpumask_and(to_cpumask(args->mask), cpus, cpu_online_mask);+	cpumask_clear_cpu(smp_processor_id(), to_cpumask(args->mask));+	if (unlikely(cpumask_empty(to_cpumask(args->mask))))+		goto issue;  	if (va == TLB_FLUSH_ALL) { 		args->op.cmd = MMUEXT_TLB_FLUSH_MULTI;@@ -672,6 +664,7 @@ static void xen_flush_tlb_others(const c  	MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF); +issue: 	xen_mc_issue(PARAVIRT_LAZY_MMU); } \0ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit
  2008-12-11 11:28 ` [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit Mike Travis
  2008-12-11 13:41   ` Heiko Carstens
@ 2008-12-12 11:41   ` Rusty Russell
  2008-12-12 15:38     ` Mike Travis
  1 sibling, 1 reply; 18+ messages in thread
From: Rusty Russell @ 2008-12-12 11:41 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Thursday 11 December 2008 21:58:09 Mike Travis wrote:
> Impact: allow adding additional cpus.
> 
> Use maxcpus=NUM kernel parameter to extend the number of possible cpus as well
> as (currently) limit them.  Any cpus >= number of present cpus will disabled.
...

These two bits of logic are very similar: can we merge them?

> -	possible = num_processors + disabled_cpus;
> -	if (possible > NR_CPUS)
> -		possible = NR_CPUS;
> +	if (setup_max_cpus == -1)	/* not specified */
> +		possible = num_processors + disabled_cpus;
> +	else if (setup_max_cpus == 0)	/* UP mode forced */
> +		possible = 1;
> +	else				/* user specified */
> +		possible = setup_max_cpus;
> +
> +	if (possible > CONFIG_NR_CPUS) {
> +		printk(KERN_WARNING
> +			"%d Processors exceeds NR_CPUS limit of %d\n",
> +			possible, CONFIG_NR_CPUS);
> +		possible = CONFIG_NR_CPUS;
> +	}
...
> -extern unsigned int setup_max_cpus;
> +extern int setup_max_cpus;
> +static inline int maxcpus(void)
> +{
> +	int maxcpus = setup_max_cpus;
> +
> +	if (maxcpus == -1 || maxcpus > CONFIG_NR_CPUS)
> +		maxcpus = CONFIG_NR_CPUS;
> +	else if (maxcpus == 0)
> +		maxcpus = 1;
> +
> +	return maxcpus;
> +}

If this didn't trunacte to CONFIG_NR_CPUS, it could still be used here:

> @@ -425,7 +423,7 @@ static void __init smp_init(void)
>  
>  	/* FIXME: This should be done in userspace --RR */
>  	for_each_present_cpu(cpu) {
> -		if (num_online_cpus() >= setup_max_cpus)
> +		if (num_online_cpus() >= maxcpus())
>  			break;
>  		if (!cpu_online(cpu))
>  			cpu_up(cpu);

And it turns out noone actually uses the smp_cpus_done() arg, so
I don't know what the semantics are supposed to be.

> @@ -433,7 +431,7 @@ static void __init smp_init(void)
>  
>  	/* Any cleanup work */
>  	printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
> -	smp_cpus_done(setup_max_cpus);
> +	smp_cpus_done(maxcpus());

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] cpumask: add sysfs displays for configured and disabled cpu maps
  2008-12-11 11:28 ` [PATCH 4/4] cpumask: add sysfs displays for configured and disabled cpu maps Mike Travis
@ 2008-12-12 11:44   ` Rusty Russell
  0 siblings, 0 replies; 18+ messages in thread
From: Rusty Russell @ 2008-12-12 11:44 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Thursday 11 December 2008 21:58:10 Mike Travis wrote:
> Impact: add new functionality.
> 
> Add sysfs files "kernel_max" and "offline" to display the max CPU index
> allowed (NR_CPUS-1), and the map of cpus that are offline. 

As discussed, I like this patch.  kernel_max tells you the kernel limit,
and offline tells you if there are any beyond that (other archs need
enhancement for this tho, which would be nice).

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit
  2008-12-12 11:41   ` Rusty Russell
@ 2008-12-12 15:38     ` Mike Travis
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Travis @ 2008-12-12 15:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel,
	Heiko Carstens

Rusty Russell wrote:
> On Thursday 11 December 2008 21:58:09 Mike Travis wrote:
>> Impact: allow adding additional cpus.
>>
>> Use maxcpus=NUM kernel parameter to extend the number of possible cpus as well
>> as (currently) limit them.  Any cpus >= number of present cpus will disabled.
> ...
> 
> These two bits of logic are very similar: can we merge them?
> 
...

Since there's already precedence for "possible_cpus=XXX" might that be a better
alternative?  In which case maxcpus does not need to change at all.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  2008-12-12 11:06   ` Rusty Russell
@ 2008-12-12 16:37     ` Mike Travis
  2008-12-13 12:03       ` Rusty Russell
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Travis @ 2008-12-12 16:37 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

Rusty Russell wrote:
> On Thursday 11 December 2008 21:58:08 Mike Travis wrote:
>> Impact: fix potential problem.
>>
>> In determining the destination apicid, there are usually three cpumasks
>> that are considered: the incoming cpumask arg, cfg->domain and the
>> cpu_online_mask.  Since we are just introducing the cpu_mask_to_apicid_and
>> function, make sure it includes the cpu_online_mask in it's evaluation.
> 
> Yerk.  Can we really "fail" cpu_mask_to_apicid_and with no repercussions?
> And does it make sense to try to fix this there?

Fail?  The only failure is if there is not a cpu that satisfies the conjunction
of the three masks, in which case it returns BAD_APICID.

The old procedure was to:

function(..., cpumask_t mask)
{
	cpumask_t tmp;

	cpus_and(tmp, mask, cfg->domain);
	...
	cpus_and(tmp, tmp, cpu_online_map);
	dest = cpu_mask_to_apicid(tmp);
	...
}

So making cpu_mask_to_apicid_and return:

	dest = cpu_mask_to_apicid(mask1 & mask2 & cpu_online_mask);

maintains this compatibility.

Turns out that there are two functions that did not AND all three masks,
but those two used TARGET_CPUS as one of the arguments.  All the x86
variations except NUMAQ, only return TARGET_CPUS which are online
(and that's probably a mistake in NUMAQ.)  So the above AND'ing is a
NOP in these cases and harmless.

> 
> This is not a new problem with the cpumask patches is it?  I toyed with a
> patch which converts flush_tlb_others, and it actually ensures that those
> cases never hand an offline mask to cpu_mask_to_apicid_and as a side
> effect (others still might).

No, I introduced the problem when I added cpu_mask_to_apicid_and(), and I
wanted to fix it before it was "officially" released.  I've been doing
benchmark testing with random HOTPLUG off and on events, and it's becoming
clear that setting the destination apicid to an off line cpu is definitely
a no-no. ;-)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] x86: fix assign_irq_vector boot up problem
  2008-12-12  9:20     ` Ingo Molnar
@ 2008-12-12 18:10       ` Mike Travis
  2008-12-12 19:06         ` Mike Travis
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Travis @ 2008-12-12 18:10 UTC (permalink / raw)
  To: Ingo Molnar, Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

Ingo Molnar wrote:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
>> On Thursday 11 December 2008 21:58:07 Mike Travis wrote:
>>> Impact: fix boot up problem.
>>>
>>> Fix a problem encountered with the Intel SATA-AHCI disk driver
>>> right at system startup.  Cpumask_intersects really needs to be
>>> a 3-way intersect, and since we need a cpumask_var_t later on,
>>> then just use it for the 3-way intersect as well.
>> This one looks fine.
>>
>> My plan was for Ingo to pull that for-ingo tree into his cpus4096 tree 
>> and take the x86 patches from there.  But he hasn't so maybe I should 
>> take this chance to fold that patch in?
> 
> i have no objections against the bits - just the sparseirq complication 
> came in. A lot of effort went into irq/sparseirq's io_apic.c changes and 
> cleanup.
> 
> So to do this cleanly, i merged those bits into cpus4096 and the 
> x86/reboot bits as well - now the plan would be for Mike to send a 
> (rebased) series against that base. I tried a plain merge and the 
> conflicts in io_apic.c were a horrendous 76 rejects due to the 
> irq/sparseirq interaction. Also, some of the commits subjects looked a 
> bit raw so this bit of the tree needs to be redone.
> 
> (Note that the existing cpumask-base+scheduler bits in cpus4096 are 
> golden already and we dont have to touch them in any way, it's just the 
> new x86 bits and new cpumask infrastructure bits that look odd or 
> clashy.)
> 
> 	Ingo

I have the merged result for io_apic and I believe it should also be
very clean in respect that it maintains the logic of of both patchsets
accurately. (About 6 hours to work through the complete file.)

What I'm confused about is how I'm to send this re-basing?  Do you only
want the patches in:

	.../pub/scm/linux/kernel/git/rusty/linux-2.6-for-ingo.git

And I've no clue how to cause a git tree to rebase.  What I can do is extract
them into quilt patches, reapply and fix conflicts, and then send them
(as quilt patches).  I'll do that today, and if you need a git tree to pull from,
then we'll figure out how to do that (with Rusty's help as again, I do not
have an external source to provide one.)

Also, you mention in separate mail:

    Please tidy up the commit logs of new cpumask bits and dont leave bits in 
    it like:

     e861b55: cpumask: Add CONFIG_CPUMASK_OFFSTACK
     65bda29: cpumask:clock_event_device-takes-cpumask-ptr
     07d73a8: cpumask:irq-functions-take-cpumask_t-ptr
     01fdd7d: cpumask:convert-few-difficult-cpumask_t-users
     9cc67bb: cpumask:centralize-common-maps

These patches went in via linux-next/rr, and are not part of the x86 patchset that
I'm working on.  I can only offer to help resolve the conflicts though again, I'm
not sure how I send them to you?  Would it be the commit from a "merge conflict
resolution" sent as a git-bundle?  [I've not done this yet so I'm really guessing.]

The tree for these changes are in:

	.../pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask.git

and they include many, many non-x86 arch files, so sending them to you is not what
should be done, yes?

Rusty - can you fix these subject lines in your git tree?

Thanks!
Mike

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] x86: fix assign_irq_vector boot up problem
  2008-12-12 18:10       ` Mike Travis
@ 2008-12-12 19:06         ` Mike Travis
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Travis @ 2008-12-12 19:06 UTC (permalink / raw)
  To: Ingo Molnar, Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

Mike Travis wrote:
> Ingo Molnar wrote:
...
>> So to do this cleanly, i merged those bits into cpus4096 and the 
>> x86/reboot bits as well - now the plan would be for Mike to send a 
>> (rebased) series against that base. I tried a plain merge and the 
...
> And I've no clue how to cause a git tree to rebase.  What I can do is extract
...

Well I have a clue now (hmm, git-rebase, who would have thunk. ;-)

I'll rebase the for-ingo tree, but I'm still confused about those
files that are in linux-next/rr since they are entering upstream
via linux-next.  Won't any changes I make be dropped when you
push the cpus4096 branch upstream?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  2008-12-12 16:37     ` Mike Travis
@ 2008-12-13 12:03       ` Rusty Russell
  0 siblings, 0 replies; 18+ messages in thread
From: Rusty Russell @ 2008-12-13 12:03 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Saturday 13 December 2008 03:07:06 Mike Travis wrote:
> Rusty Russell wrote:
> > On Thursday 11 December 2008 21:58:08 Mike Travis wrote:
> >> Impact: fix potential problem.
> >>
> >> In determining the destination apicid, there are usually three cpumasks
> >> that are considered: the incoming cpumask arg, cfg->domain and the
> >> cpu_online_mask.  Since we are just introducing the cpu_mask_to_apicid_and
> >> function, make sure it includes the cpu_online_mask in it's evaluation.
> > 
> > Yerk.  Can we really "fail" cpu_mask_to_apicid_and with no repercussions?
> > And does it make sense to try to fix this there?
> 
> Fail?  The only failure is if there is not a cpu that satisfies the conjunction
> of the three masks, in which case it returns BAD_APICID.

That patch showed cpumask_alloc_var in some implementations of
cpu_mask_to_apicid_and.  This is bad.

> The old procedure was to:
> 
> function(..., cpumask_t mask)
> {
> 	cpumask_t tmp;
> 
> 	cpus_and(tmp, mask, cfg->domain);
> 	...
> 	cpus_and(tmp, tmp, cpu_online_map);
> 	dest = cpu_mask_to_apicid(tmp);
> 	...
> }
> 
> So making cpu_mask_to_apicid_and return:
> 
> 	dest = cpu_mask_to_apicid(mask1 & mask2 & cpu_online_mask);
> 
> maintains this compatibility.

But that was the purpose of x86:set_desc_affinity.patch.  It centralized
those conventions, and other than being a nice cleanup, it got that right.

See below.

Now, there are some places which call cpu_mask_to_apicid_and() and don't
include the online mask, but I checked: they were that way before.  Possibly
an existing bug?

Otherwise, it could be that setting the desc->affinity earlier (as this patch
does) has caused a problem.  Which of these code paths were you running?

Cheers,
Rusty.

===
x86: centralize common code for set_affinity variants.

Impact: cleanup, remove on-stack cpumasks

Everyone checks that a cpu is online, calls assign_irq_vector, and
also uses a temporary cpumask.

I *think* it's legal to set the desc->affinity in place in all cases,
so I avoid the temporary cpumask.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
---
 arch/x86/kernel/io_apic.c |  112 ++++++++++++++--------------------------------
 1 file changed, 36 insertions(+), 76 deletions(-)

diff -r 07e2723b2a5d arch/x86/kernel/io_apic.c
--- a/arch/x86/kernel/io_apic.c	Tue Nov 18 11:20:08 2008 +1030
+++ b/arch/x86/kernel/io_apic.c	Tue Nov 18 11:40:35 2008 +1030
@@ -361,33 +361,38 @@
 
 static int assign_irq_vector(int irq, const struct cpumask *mask);
 
+/* Either sets desc->affinity to a valid value, and returns cpu_mask_to_apicid
+ * of that, or returns BAD_APICID and leaves desc->affinity untouched. */
+static unsigned int set_desc_affinity(unsigned irq, const struct cpumask *mask)
+{
+	struct irq_cfg *cfg;
+	struct irq_desc *desc;
+
+	if (!cpumask_intersects(mask, cpu_online_mask))
+		return BAD_APICID;
+
+	cfg = irq_cfg(irq);
+	if (assign_irq_vector(irq, mask))
+		return BAD_APICID;
+
+	desc = irq_to_desc(irq);
+	cpumask_and(&desc->affinity, to_cpumask(cfg->domain), mask);
+	return cpu_mask_to_apicid(&desc->affinity);
+}
+
 static void set_ioapic_affinity_irq(unsigned int irq,
 				    const struct cpumask *mask)
 {
-	struct irq_cfg *cfg;
 	unsigned long flags;
 	unsigned int dest;
-	cpumask_t tmp;
-	struct irq_desc *desc;
 
-	if (!cpumask_intersects(mask, cpu_online_mask))
-		return;
-
-	cfg = irq_cfg(irq);
-	if (assign_irq_vector(irq, mask))
-		return;
-
-	cpumask_and(&tmp, &cfg->domain, mask);
-	dest = cpu_mask_to_apicid(&tmp);
-	/*
-	 * Only the high 8 bits are valid.
-	 */
-	dest = SET_APIC_LOGICAL_ID(dest);
-
-	desc = irq_to_desc(irq);
 	spin_lock_irqsave(&ioapic_lock, flags);
-	__target_IO_APIC_irq(irq, dest, cfg->vector);
-	cpumask_copy(&desc->affinity, mask);
+	dest = set_desc_affinity(irq, mask);
+	if (dest != BAD_APICID) {
+		/* Only the high 8 bits are valid. */
+		dest = SET_APIC_LOGICAL_ID(dest);
+		__target_IO_APIC_irq(irq, dest, irq_cfg(irq)->vector);
+	}
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 #endif /* CONFIG_SMP */
@@ -3017,32 +3022,21 @@
 #ifdef CONFIG_SMP
 static void set_msi_irq_affinity(unsigned int irq, const struct cpumask *mask)
 {
-	struct irq_cfg *cfg;
 	struct msi_msg msg;
 	unsigned int dest;
-	cpumask_t tmp;
-	struct irq_desc *desc;
 
-	if (!cpumask_intersects(mask, cpu_online_mask))
+	dest = set_desc_affinity(irq, mask);
+	if (dest == BAD_APICID)
 		return;
-
-	if (assign_irq_vector(irq, mask))
-		return;
-
-	cfg = irq_cfg(irq);
-	cpumask_and(&tmp, &cfg->domain, mask);
-	dest = cpu_mask_to_apicid(&tmp);
 
 	read_msi_msg(irq, &msg);
 
 	msg.data &= ~MSI_DATA_VECTOR_MASK;
-	msg.data |= MSI_DATA_VECTOR(cfg->vector);
+	msg.data |= MSI_DATA_VECTOR(irq_cfg(irq)->vector);
 	msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
 	msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
 	write_msi_msg(irq, &msg);
-	desc = irq_to_desc(irq);
-	cpumask_copy(&desc->affinity, mask);
 }
 
 #ifdef CONFIG_INTR_REMAP
@@ -3055,23 +3049,17 @@
 {
 	struct irq_cfg *cfg;
 	unsigned int dest;
-	cpumask_t tmp;
 	struct irte irte;
 	struct irq_desc *desc;
-
-	if (!cpumask_intersects(mask, cpu_online_mask))
-		return;
 
 	if (get_irte(irq, &irte))
 		return;
 
-	if (assign_irq_vector(irq, mask))
+	dest = set_desc_affinity(irq, mask);
+	if (dest == BAD_APICID)
 		return;
 
 	cfg = irq_cfg(irq);
-	cpumask_and(&tmp, to_cpumask(cfg->domain), mask);
-	dest = cpu_mask_to_apicid(&tmp);
-
 	irte.vector = cfg->vector;
 	irte.dest_id = IRTE_DEST(dest);
 
@@ -3106,9 +3094,6 @@
 		}
 		cfg->move_in_progress = 0;
 	}
-
-	desc = irq_to_desc(irq);
-	cpumask_copy(&desc->affinity, mask);
 }
 #endif
 #endif /* CONFIG_SMP */
@@ -3315,19 +3300,12 @@
 	struct irq_cfg *cfg;
 	struct msi_msg msg;
 	unsigned int dest;
-	cpumask_t tmp;
-	struct irq_desc *desc;
 
-	if (!cpumask_intersects(mask, cpu_online_mask))
-		return;
-
-	if (assign_irq_vector(irq, mask))
+	dest = set_desc_affinity(irq, mask);
+	if (dest == BAD_APICID)
 		return;
 
 	cfg = irq_cfg(irq);
-	cpumask_and(&tmp, &cfg->domain, mask);
-	dest = cpu_mask_to_apicid(&tmp);
-
 	dmar_msi_read(irq, &msg);
 
 	msg.data &= ~MSI_DATA_VECTOR_MASK;
@@ -3336,8 +3314,6 @@
 	msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
 	dmar_msi_write(irq, &msg);
-	desc = irq_to_desc(irq);
-	cpumask_copy(&desc->affinity, mask);
 }
 #endif /* CONFIG_SMP */
 
@@ -3373,20 +3349,14 @@
 static void hpet_msi_set_affinity(unsigned int irq, const struct cpumask *mask)
 {
 	struct irq_cfg *cfg;
-	struct irq_desc *desc;
 	struct msi_msg msg;
 	unsigned int dest;
-	cpumask_t tmp;
 
-	if (!cpumask_intersects(mask, cpu_online_mask))
-		return;
-
-	if (assign_irq_vector(irq, mask))
+	dest = set_desc_affinity(irq, mask);
+	if (dest == BAD_APICID)
 		return;
 
 	cfg = irq_cfg(irq);
-	cpumask_and(&tmp, to_cpumask(cfg->domain), mask);
-	dest = cpu_mask_to_apicid(&tmp);
 
 	hpet_msi_read(irq, &msg);
 
@@ -3396,8 +3366,6 @@
 	msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
 	hpet_msi_write(irq, &msg);
-	desc = irq_to_desc(irq);
-	cpumask_copy(&desc->affinity, mask);
 }
 #endif /* CONFIG_SMP */
 
@@ -3455,22 +3423,14 @@
 {
 	struct irq_cfg *cfg;
 	unsigned int dest;
-	cpumask_t tmp;
-	struct irq_desc *desc;
 
-	if (!cpumask_intersects(mask, cpu_online_mask))
-		return;
-
-	if (assign_irq_vector(irq, mask))
+	dest = set_desc_affinity(irq, mask);
+	if (dest == BAD_APICID)
 		return;
 
 	cfg = irq_cfg(irq);
-	cpumask_and(&tmp, to_cpumask(cfg->domain), mask);
-	dest = cpu_mask_to_apicid(&tmp);
 
 	target_ht_irq(irq, dest, cfg->vector);
-	desc = irq_to_desc(irq);
-	cpumask_copy(&desc->affinity, mask);
 }
 #endif
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-12-13 12:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-11 11:28 [PATCH 0/4] cpumask: fixups and additions Mike Travis
2008-12-11 11:28 ` [PATCH 1/4] x86: fix assign_irq_vector boot up problem Mike Travis
2008-12-12  8:27   ` Rusty Russell
2008-12-12  9:20     ` Ingo Molnar
2008-12-12 18:10       ` Mike Travis
2008-12-12 19:06         ` Mike Travis
2008-12-11 11:28 ` [PATCH 2/4] x86: fix cpu_mask_to_apicid_and to include cpu_online_mask Mike Travis
2008-12-12 11:06   ` Rusty Russell
2008-12-12 16:37     ` Mike Travis
2008-12-13 12:03       ` Rusty Russell
2008-12-11 11:28 ` [PATCH 3/4] cpumask: use maxcpus=NUM to extend the cpu limit as well as restrict the limit Mike Travis
2008-12-11 13:41   ` Heiko Carstens
2008-12-11 18:19     ` Mike Travis
2008-12-12 10:03       ` Heiko Carstens
2008-12-12 11:41   ` Rusty Russell
2008-12-12 15:38     ` Mike Travis
2008-12-11 11:28 ` [PATCH 4/4] cpumask: add sysfs displays for configured and disabled cpu maps Mike Travis
2008-12-12 11:44   ` Rusty Russell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.