linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Make for_each_cpu_mask a bit smaller
@ 2008-05-11 13:50 Alexander van Heukelum
  2008-05-11 13:57 ` Paul Jackson
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Alexander van Heukelum @ 2008-05-11 13:50 UTC (permalink / raw)
  To: Andrew Morton, Paul Jackson
  Cc: Ingo Molnar, Thomas Gleixner, linux-arch, LKML, heukelum

The for_each_cpu_mask loop is used quite often in the kernel. It
makes use of two functions: first_cpu and next_cpu. This patch
changes for_each_cpu_mask to use only one function: a newly
introduced find_next_cpu. Each use of the for_each_cpu_mask
constuct then becomes a few bytes smaller. An x86_64 defconfig
kernel is about 2000 bytes smaller with this patch applied:

   text	   data	    bss	    dec	    hex	filename
5395732	 976736	 734280	7106748	 6c70bc	vmlinux.orig
5393639	 976736	 734280	7104655	 6c688f	vmlinux

Runs fine on qemu UP/SMP x86_64/i386.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---

Hello Andrew,

Could you add this patch to -mm?

Greetings,
	Alexander

 include/linux/cpumask.h |   14 ++++++++------
 lib/cpumask.c           |    6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 9650806..a760e29 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -221,9 +221,11 @@ int __first_cpu(const cpumask_t *srcp);
 #define first_cpu(src) __first_cpu(&(src))
 int __next_cpu(int n, const cpumask_t *srcp);
 #define next_cpu(n, src) __next_cpu((n), &(src))
+int find_next_cpu_mask(int n, const cpumask_t *srcp);
 #else
-#define first_cpu(src)		({ (void)(src); 0; })
-#define next_cpu(n, src)	({ (void)(src); 1; })
+#define first_cpu(src)			({ (void)(src); 0; })
+#define next_cpu(n, src)		({ (void)(src); 1; })
+#define find_next_cpu_mask(n, src)	({ (void)(src); n; })
 #endif
 
 #ifdef CONFIG_HAVE_CPUMASK_OF_CPU_MAP
@@ -351,10 +353,10 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
 }
 
 #if NR_CPUS > 1
-#define for_each_cpu_mask(cpu, mask)		\
-	for ((cpu) = first_cpu(mask);		\
-		(cpu) < NR_CPUS;		\
-		(cpu) = next_cpu((cpu), (mask)))
+#define for_each_cpu_mask(cpu, mask)				\
+	for ((cpu) = 0;						\
+		(cpu) = find_next_cpu_mask((cpu), &(mask)),	\
+		(cpu) < NR_CPUS; (cpu)++)
 #else /* NR_CPUS == 1 */
 #define for_each_cpu_mask(cpu, mask)		\
 	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
diff --git a/lib/cpumask.c b/lib/cpumask.c
index bb4f76d..93dd6ca 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -15,6 +15,12 @@ int __next_cpu(int n, const cpumask_t *srcp)
 }
 EXPORT_SYMBOL(__next_cpu);
 
+int find_next_cpu_mask(int n, const cpumask_t *srcp)
+{
+	return find_next_bit(srcp->bits, NR_CPUS, n);
+}
+EXPORT_SYMBOL(find_next_cpu_mask);
+
 int __any_online_cpu(const cpumask_t *mask)
 {
 	int cpu;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Make for_each_cpu_mask a bit smaller
  2008-05-11 13:50 [PATCH] Make for_each_cpu_mask a bit smaller Alexander van Heukelum
@ 2008-05-11 13:57 ` Paul Jackson
  2008-05-11 14:14 ` Paul Jackson
  2008-05-11 15:24 ` [PATCH] " Matthew Wilcox
  2 siblings, 0 replies; 16+ messages in thread
From: Paul Jackson @ 2008-05-11 13:57 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: akpm, mingo, tglx, linux-arch, linux-kernel, heukelum, travis

Alexander wrote:
> Each use of the for_each_cpu_mask
> constuct then becomes a few bytes smaller. 

Looks good, at first glance.

Could you include Mike Travis <travis@sgi.com> on
the CC list for this work?  He's been having
fun lately with cpumasks.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Make for_each_cpu_mask a bit smaller
  2008-05-11 13:50 [PATCH] Make for_each_cpu_mask a bit smaller Alexander van Heukelum
  2008-05-11 13:57 ` Paul Jackson
@ 2008-05-11 14:14 ` Paul Jackson
  2008-05-11 16:06   ` [RFC/PATCH] Make for_each_node_mask out-of-line Alexander van Heukelum
  2008-05-11 15:24 ` [PATCH] " Matthew Wilcox
  2 siblings, 1 reply; 16+ messages in thread
From: Paul Jackson @ 2008-05-11 14:14 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: akpm, mingo, tglx, linux-arch, linux-kernel, heukelum, travis

Alexander wrote:
> This patch
> changes for_each_cpu_mask to use only one function: a newly
> introduced find_next_cpu.

I believe that it's for_each_cpu_mask which is newly introduced,
not find_next_cpu ... just a typo, granted.

Any chance that you could make the same change to nodemask.h?
Where practical, I like to keep cpumask.h and nodemask.h the same.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Make for_each_cpu_mask a bit smaller
  2008-05-11 13:50 [PATCH] Make for_each_cpu_mask a bit smaller Alexander van Heukelum
  2008-05-11 13:57 ` Paul Jackson
  2008-05-11 14:14 ` Paul Jackson
@ 2008-05-11 15:24 ` Matthew Wilcox
  2008-05-11 16:19   ` Alexander van Heukelum
  2 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2008-05-11 15:24 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Andrew Morton, Paul Jackson, Ingo Molnar, Thomas Gleixner,
	linux-arch, LKML, heukelum

On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
>  #if NR_CPUS > 1
> -#define for_each_cpu_mask(cpu, mask)		\
> -	for ((cpu) = first_cpu(mask);		\
> -		(cpu) < NR_CPUS;		\
> -		(cpu) = next_cpu((cpu), (mask)))
> +#define for_each_cpu_mask(cpu, mask)				\
> +	for ((cpu) = 0;						\
> +		(cpu) = find_next_cpu_mask((cpu), &(mask)),	\
> +		(cpu) < NR_CPUS; (cpu)++)

For anyone else having similar cognitive dissonance while reading this
thinking "But won't the first call to find_next_cpu_mask return a number
> 0", the answer is "no, find_next_bit returns the next set bit that's
>= the number passed in, which is why we need both the cpu++ and
find_next_cpu_mask".  

> +int find_next_cpu_mask(int n, const cpumask_t *srcp)
> +{
> +	return find_next_bit(srcp->bits, NR_CPUS, n);
> +}
> +EXPORT_SYMBOL(find_next_cpu_mask);

Maybe a better name for this function would help.  I can't think of a
good one right now though.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC/PATCH] Make for_each_node_mask out-of-line
  2008-05-11 14:14 ` Paul Jackson
@ 2008-05-11 16:06   ` Alexander van Heukelum
  2008-05-11 21:01     ` Paul Jackson
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander van Heukelum @ 2008-05-11 16:06 UTC (permalink / raw)
  To: Paul Jackson, Mike Travis
  Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, linux-arch, LKML, heukelum

The for_each_node_mask loop makes use of two inlined functions:
first_node and next_node. This patch changes for_each_node_mask
to use only one out-of-line function: find_next_node_mask. An
x86_64 defconfig kernel is about 1500 bytes smaller with this
patch applied:

   text	   data	    bss	    dec	    hex	filename
5395732	 976736	 734280	7106748	 6c70bc	vmlinux.orig
5394174	 976736	 734280	7105190	 6c6aa6	vmlinux

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---

Hello,
> Alexander wrote:
> > This patch
> > changes for_each_cpu_mask to use only one function: a newly
> > introduced find_next_cpu.
> 
> I believe that it's for_each_cpu_mask which is newly introduced,
> not find_next_cpu ... just a typo, granted.

I meant find_next_cpu_mask. That was a last-minute change of mind
about the name :/.

> Any chance that you could make the same change to nodemask.h?
> Where practical, I like to keep cpumask.h and nodemask.h the same.

Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
if building it should depend on CONFIG_SMP or something else (NUMA?).
When is MAX_NUMNODES 1?

It seems to work on qemu x86_64-smp and i386-up.

I'ld be happy to take a stab at aligning the cpumask and nodemask
code even more by uninlining some more functions and using stubs
for the MAX_NUMNODES=1 case.

Greetings,
	Alexander

 include/linux/nodemask.h |   11 +++++++----
 lib/Makefile             |    2 +-
 lib/nodemask.c           |   10 ++++++++++
 3 files changed, 18 insertions(+), 5 deletions(-)
 create mode 100644 lib/nodemask.c

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 848025c..dbc80f0 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -239,6 +239,8 @@ static inline int __next_node(int n, const nodemask_t *srcp)
 	return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
 }
 
+int find_next_node_mask(int n, const nodemask_t *srcp);
+
 #define nodemask_of_node(node)						\
 ({									\
 	typeof(_unused_nodemask_arg_) m;				\
@@ -347,10 +349,11 @@ static inline void __nodes_fold(nodemask_t *dstp, const nodemask_t *origp,
 }
 
 #if MAX_NUMNODES > 1
-#define for_each_node_mask(node, mask)			\
-	for ((node) = first_node(mask);			\
-		(node) < MAX_NUMNODES;			\
-		(node) = next_node((node), (mask)))
+#define for_each_node_mask(node, mask)				\
+	for ((node) = 0;					\
+		(node) = find_next_node_mask((node), &(mask)),	\
+		(node) < MAX_NUMNODES;				\
+		(node)++)
 #else /* MAX_NUMNODES == 1 */
 #define for_each_node_mask(node, mask)			\
 	if (!nodes_empty(mask))				\
diff --git a/lib/Makefile b/lib/Makefile
index 74b0cfb..48fc85c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -9,7 +9,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 proportions.o prio_heap.o ratelimit.o
 
 lib-$(CONFIG_MMU) += ioremap.o
-lib-$(CONFIG_SMP) += cpumask.o
+lib-$(CONFIG_SMP) += cpumask.o nodemask.o
 
 lib-y	+= kobject.o kref.o klist.o
 
diff --git a/lib/nodemask.c b/lib/nodemask.c
new file mode 100644
index 0000000..08341d3
--- /dev/null
+++ b/lib/nodemask.c
@@ -0,0 +1,10 @@
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+#include <linux/nodemask.h>
+#include <linux/module.h>
+
+int find_next_node_mask(int n, const nodemask_t *srcp)
+{
+	return find_next_bit(srcp->bits, NR_CPUS, n);
+}
+EXPORT_SYMBOL(find_next_node_mask);


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Make for_each_cpu_mask a bit smaller
  2008-05-11 15:24 ` [PATCH] " Matthew Wilcox
@ 2008-05-11 16:19   ` Alexander van Heukelum
  2008-05-11 22:01     ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander van Heukelum @ 2008-05-11 16:19 UTC (permalink / raw)
  To: Matthew Wilcox, Alexander van Heukelum
  Cc: Andrew Morton, Paul Jackson, Ingo Molnar, Thomas Gleixner,
	linux-arch, LKML


On Sun, 11 May 2008 09:24:40 -0600, "Matthew Wilcox" <matthew@wil.cx>
said:
> On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
> >  #if NR_CPUS > 1
> > -#define for_each_cpu_mask(cpu, mask)		\
> > -	for ((cpu) = first_cpu(mask);		\
> > -		(cpu) < NR_CPUS;		\
> > -		(cpu) = next_cpu((cpu), (mask)))
> > +#define for_each_cpu_mask(cpu, mask)				\
> > +	for ((cpu) = 0;						\
> > +		(cpu) = find_next_cpu_mask((cpu), &(mask)),	\
> > +		(cpu) < NR_CPUS; (cpu)++)
> 
> For anyone else having similar cognitive dissonance while reading this
> thinking "But won't the first call to find_next_cpu_mask return a number
> > 0", the answer is "no, find_next_bit returns the next set bit that's
> >= the number passed in, which is why we need both the cpu++ and
> find_next_cpu_mask".  

That's how it works, indeed.

> > +int find_next_cpu_mask(int n, const cpumask_t *srcp)
> > +{
> > +	return find_next_bit(srcp->bits, NR_CPUS, n);
> > +}
> > +EXPORT_SYMBOL(find_next_cpu_mask);
> 
> Maybe a better name for this function would help.  I can't think of a
> good one right now though.

I can't think of a better name, and there is find_next_bit of which
find_next_cpu_mask is just a wrapper. I think the name is good enough.

Greetings,
    Alexander
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Access all of your messages and folders
                          wherever you are


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] Make for_each_node_mask out-of-line
  2008-05-11 16:06   ` [RFC/PATCH] Make for_each_node_mask out-of-line Alexander van Heukelum
@ 2008-05-11 21:01     ` Paul Jackson
  2008-05-12 12:04       ` Alexander van Heukelum
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Jackson @ 2008-05-11 21:01 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: travis, akpm, mingo, tglx, linux-arch, linux-kernel, heukelum

Alexander wrote:
> Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
> if building it should depend on CONFIG_SMP or something else (NUMA?).
> When is MAX_NUMNODES 1?

Well ... I'm pretty sure it made sense to depend on SMP, back when
it was first added.  However that might have changed.  I recall
vaguely that there has been discussion of this CONFIG_SMP dependency
every year or so, but I don't have the time right now to dig through
the archives and code to figure it out.

So ... offhand ... good questions, but I don't have answers.


> I'ld be happy to take a stab at aligning the cpumask and nodemask
> code even more by uninlining some more functions and using stubs
> for the MAX_NUMNODES=1 case.

That could be good ... though could you co-ordinate with Mike Travis
first, to minimize the risks of merge conflicts with what he's doing?

You kernel text space saving in the first patch seemed worth going
ahead with even if it did conflict a little, and I liked the matching
nodemask patch, just to keep things in sync.  Other nodemask cleanup
is a little lower priority in my book, so should make a modest effort
to co-ordinate with more critical patches, to minimize conflict.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Make for_each_cpu_mask a bit smaller
  2008-05-11 16:19   ` Alexander van Heukelum
@ 2008-05-11 22:01     ` Matthew Wilcox
  2008-05-12 11:04       ` Alexander van Heukelum
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2008-05-11 22:01 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Alexander van Heukelum, Andrew Morton, Paul Jackson, Ingo Molnar,
	Thomas Gleixner, linux-arch, LKML

On Sun, May 11, 2008 at 06:19:39PM +0200, Alexander van Heukelum wrote:
> 
> On Sun, 11 May 2008 09:24:40 -0600, "Matthew Wilcox" <matthew@wil.cx>
> said:
> > On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
> > >  #if NR_CPUS > 1
> > > -#define for_each_cpu_mask(cpu, mask)		\
> > > -	for ((cpu) = first_cpu(mask);		\
> > > -		(cpu) < NR_CPUS;		\
> > > -		(cpu) = next_cpu((cpu), (mask)))
> > > +#define for_each_cpu_mask(cpu, mask)				\
> > > +	for ((cpu) = 0;						\
> > > +		(cpu) = find_next_cpu_mask((cpu), &(mask)),	\
> > > +		(cpu) < NR_CPUS; (cpu)++)
> > 
> > For anyone else having similar cognitive dissonance while reading this
> > thinking "But won't the first call to find_next_cpu_mask return a number
> > > 0", the answer is "no, find_next_bit returns the next set bit that's
> > >= the number passed in, which is why we need both the cpu++ and
> > find_next_cpu_mask".  
> 
> That's how it works, indeed.
> 
> > > +int find_next_cpu_mask(int n, const cpumask_t *srcp)
> > > +{
> > > +	return find_next_bit(srcp->bits, NR_CPUS, n);
> > > +}
> > > +EXPORT_SYMBOL(find_next_cpu_mask);
> > 
> > Maybe a better name for this function would help.  I can't think of a
> > good one right now though.
> 
> I can't think of a better name, and there is find_next_bit of which
> find_next_cpu_mask is just a wrapper. I think the name is good enough.

How about doing it this way?

#define for_each_cpu_mask(cpu, mask)				\
	for ((cpu) = -1;					\
	     (cpu) < NR_CPUS;					\
	     (cpu) = find_next_cpu_mask((cpu), &(mask)))

int find_next_cpu_mask(int n, const cpumask_t *srcp)
{
	return find_next_bit(srcp->bits, NR_CPUS, ++n);
}

That actually behaves the way I'd expect a function called
'find_next_cpu_mask' to work.  It also abuses the 'for' condtion
less and might take a little less text space.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Make for_each_cpu_mask a bit smaller
  2008-05-11 22:01     ` Matthew Wilcox
@ 2008-05-12 11:04       ` Alexander van Heukelum
  2008-05-12 11:56         ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander van Heukelum @ 2008-05-12 11:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander van Heukelum, Andrew Morton, Paul Jackson, Ingo Molnar,
	Thomas Gleixner, linux-arch, LKML

On Sun, 11 May 2008 16:01:12 -0600, "Matthew Wilcox" <matthew@wil.cx>
said:
> On Sun, May 11, 2008 at 06:19:39PM +0200, Alexander van Heukelum wrote:
> > 
> > On Sun, 11 May 2008 09:24:40 -0600, "Matthew Wilcox" <matthew@wil.cx>
> > said:
> > > On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
> > > >  #if NR_CPUS > 1
> > > > -#define for_each_cpu_mask(cpu, mask)		\
> > > > -	for ((cpu) = first_cpu(mask);		\
> > > > -		(cpu) < NR_CPUS;		\
> > > > -		(cpu) = next_cpu((cpu), (mask)))
> > > > +#define for_each_cpu_mask(cpu, mask)				\
> > > > +	for ((cpu) = 0;						\
> > > > +		(cpu) = find_next_cpu_mask((cpu), &(mask)),	\
> > > > +		(cpu) < NR_CPUS; (cpu)++)
> > > 
> > > For anyone else having similar cognitive dissonance while reading this
> > > thinking "But won't the first call to find_next_cpu_mask return a number
> > > > 0", the answer is "no, find_next_bit returns the next set bit that's
> > > >= the number passed in, which is why we need both the cpu++ and
> > > find_next_cpu_mask".  
> > 
> > That's how it works, indeed.
> > 
> > > > +int find_next_cpu_mask(int n, const cpumask_t *srcp)
> > > > +{
> > > > +	return find_next_bit(srcp->bits, NR_CPUS, n);
> > > > +}
> > > > +EXPORT_SYMBOL(find_next_cpu_mask);
> > > 
> > > Maybe a better name for this function would help.  I can't think of a
> > > good one right now though.
> > 
> > I can't think of a better name, and there is find_next_bit of which
> > find_next_cpu_mask is just a wrapper. I think the name is good enough.
> 
> How about doing it this way?
> 
> #define for_each_cpu_mask(cpu, mask)				\
> 	for ((cpu) = -1;					\
> 	     (cpu) < NR_CPUS;					\
> 	     (cpu) = find_next_cpu_mask((cpu), &(mask)))
> 
> int find_next_cpu_mask(int n, const cpumask_t *srcp)
> {
> 	return find_next_bit(srcp->bits, NR_CPUS, ++n);
> }
> 
> That actually behaves the way I'd expect a function called
> 'find_next_cpu_mask' to work.  It also abuses the 'for' condtion
> less and might take a little less text space.

But it does not work.

It introduces a stray cpu=-1 iteration if cpu happens to be
(replaced by) a signed variable.

It skips the entire loop if cpu happens to be unsigned.

I don't think that using 'for' in a less conventional way
is bad if it is hidden in a macro, as long as the name of
the macro makes the intention sufficiently clear.

I think of find_next_cpu_mask(cpu, mask) as: "find next
cpu-index in mask, starting at index cpu". And similar
with find_next_bit.

As for the text-space argument, I think you might be right.
Just not on i386/x86_64 where initialising a register to -1
can be done in three bytes, initialising to 0 in two bytes
and an increment in one byte :-).

Greetings,
    Alexander

> -- 
> Intel are signing my paycheques ... these opinions are still mine
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours.  We can't possibly take such
> a retrograde step."
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - mmm... Fastmail...


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Make for_each_cpu_mask a bit smaller
  2008-05-12 11:04       ` Alexander van Heukelum
@ 2008-05-12 11:56         ` Matthew Wilcox
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2008-05-12 11:56 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Alexander van Heukelum, Andrew Morton, Paul Jackson, Ingo Molnar,
	Thomas Gleixner, linux-arch, LKML

On Mon, May 12, 2008 at 01:04:54PM +0200, Alexander van Heukelum wrote:
> > #define for_each_cpu_mask(cpu, mask)				\
> > 	for ((cpu) = -1;					\
> > 	     (cpu) < NR_CPUS;					\
> > 	     (cpu) = find_next_cpu_mask((cpu), &(mask)))
> > 
> > int find_next_cpu_mask(int n, const cpumask_t *srcp)
> > {
> > 	return find_next_bit(srcp->bits, NR_CPUS, ++n);
> > }
> > 
> > That actually behaves the way I'd expect a function called
> > 'find_next_cpu_mask' to work.  It also abuses the 'for' condtion
> > less and might take a little less text space.
> 
> But it does not work.

You're right, of course.  Either of these should work:

#define for_each_cpu_mask(cpu, mask)				\
	for ((cpu) = find_next_cpu_mask(-1, &(mask));		\
	     (cpu) < NR_CPUS;					\
	     (cpu) = find_next_cpu_mask((cpu), &(mask)))

#define for_each_cpu_mask(cpu, mask)				\
	for ((cpu) = -1;					\
	     (cpu) = find_next_cpu_mask((cpu), &(mask)), (cpu) < NR_CPUS; \
	     )

int find_next_cpu_mask(int n, const cpumask_t *srcp)
{
	return find_next_bit(srcp->bits, NR_CPUS, ++n);
}

> I think of find_next_cpu_mask(cpu, mask) as: "find next
> cpu-index in mask, starting at index cpu". And similar
> with find_next_bit.

If you're determined to stick to your original formulation, renaming it
to find_valid_cpu_mask() would work better.

> As for the text-space argument, I think you might be right.
> Just not on i386/x86_64 where initialising a register to -1
> can be done in three bytes, initialising to 0 in two bytes
> and an increment in one byte :-).

While the initialisation may take one extra byte, the increment is
done in the function, and so takes at least one byte out of the loop.
Of course on other instruction sets, that's probably 4 bytes out of the
loop and they can initialise to -1 as cheaply as init to 0, so it's a
win on non-x86 and neutral on x86.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] Make for_each_node_mask out-of-line
  2008-05-11 21:01     ` Paul Jackson
@ 2008-05-12 12:04       ` Alexander van Heukelum
  2008-05-12 16:45         ` Mike Travis
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander van Heukelum @ 2008-05-12 12:04 UTC (permalink / raw)
  To: Mike Travis, Paul Jackson
  Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, linux-arch,
	linux-kernel, Alexander van Heukelum

On Sun, 11 May 2008 16:01:04 -0500, "Paul Jackson" <pj@sgi.com> said:
> Alexander wrote:
> > Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
> > if building it should depend on CONFIG_SMP or something else (NUMA?).
> > When is MAX_NUMNODES 1?
> 
> Well ... I'm pretty sure it made sense to depend on SMP, back when
> it was first added.  However that might have changed.  I recall
> vaguely that there has been discussion of this CONFIG_SMP dependency
> every year or so, but I don't have the time right now to dig through
> the archives and code to figure it out.
> 
> So ... offhand ... good questions, but I don't have answers.
> 
> > I'ld be happy to take a stab at aligning the cpumask and nodemask
> > code even more by uninlining some more functions and using stubs
> > for the MAX_NUMNODES=1 case.
> 
> That could be good ... though could you co-ordinate with Mike Travis
> first, to minimize the risks of merge conflicts with what he's doing?

I believe the x86#testing tree includes Mike's work? The two patches
in this thread apply fine to current x86#testing.

> You kernel text space saving in the first patch seemed worth going
> ahead with even if it did conflict a little, and I liked the matching
> nodemask patch, just to keep things in sync.  Other nodemask cleanup
> is a little lower priority in my book, so should make a modest effort
> to co-ordinate with more critical patches, to minimize conflict.

Thanks for your guidance.

Greetings,
    Alexander

> -- 
>                   I won't rest till it's the best ...
>                   Programmer, Linux Scalability
>                   Paul Jackson <pj@sgi.com> 1.940.382.4214
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Accessible with your email software
                          or over the web


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] Make for_each_node_mask out-of-line
  2008-05-12 12:04       ` Alexander van Heukelum
@ 2008-05-12 16:45         ` Mike Travis
  2008-05-12 19:00           ` [PATCHv2] Make for_each_cpu_mask a bit smaller Alexander van Heukelum
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Travis @ 2008-05-12 16:45 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Paul Jackson, Andrew Morton, Ingo Molnar, Thomas Gleixner,
	linux-arch, linux-kernel, Alexander van Heukelum

Alexander van Heukelum wrote:
> On Sun, 11 May 2008 16:01:04 -0500, "Paul Jackson" <pj@sgi.com> said:
>> Alexander wrote:
>>> Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
>>> if building it should depend on CONFIG_SMP or something else (NUMA?).
>>> When is MAX_NUMNODES 1?
>> Well ... I'm pretty sure it made sense to depend on SMP, back when
>> it was first added.  However that might have changed.  I recall
>> vaguely that there has been discussion of this CONFIG_SMP dependency
>> every year or so, but I don't have the time right now to dig through
>> the archives and code to figure it out.
>>
>> So ... offhand ... good questions, but I don't have answers.
>>
>>> I'ld be happy to take a stab at aligning the cpumask and nodemask
>>> code even more by uninlining some more functions and using stubs
>>> for the MAX_NUMNODES=1 case.
>> That could be good ... though could you co-ordinate with Mike Travis
>> first, to minimize the risks of merge conflicts with what he's doing?
> 
> I believe the x86#testing tree includes Mike's work? The two patches
> in this thread apply fine to current x86#testing.

Yes, my only change right now is to use nr_cpu_ids instead of NR_CPUS
which short cuts a huge chunk out of the back end of the loop.  And my
changes are in sched/latest (which includes x86/latest).

Thanks,
Mike

> 
>> You kernel text space saving in the first patch seemed worth going
>> ahead with even if it did conflict a little, and I liked the matching
>> nodemask patch, just to keep things in sync.  Other nodemask cleanup
>> is a little lower priority in my book, so should make a modest effort
>> to co-ordinate with more critical patches, to minimize conflict.
> 
> Thanks for your guidance.
> 
> Greetings,
>     Alexander
> 
>> -- 
>>                   I won't rest till it's the best ...
>>                   Programmer, Linux Scalability
>>                   Paul Jackson <pj@sgi.com> 1.940.382.4214


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2] Make for_each_cpu_mask a bit smaller
  2008-05-12 16:45         ` Mike Travis
@ 2008-05-12 19:00           ` Alexander van Heukelum
  2008-05-12 21:45             ` Andreas Schwab
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander van Heukelum @ 2008-05-12 19:00 UTC (permalink / raw)
  To: Mike Travis, Ingo Molnar
  Cc: Andrew Morton, Paul Jackson, Thomas Gleixner, Matthew Wilcox,
	ARCH, LKML, Alexander van Heukelum

The for_each_cpu_mask loop is used quite often in the kernel. It
makes use of two functions: first_cpu and next_cpu. This patch
changes for_each_cpu_mask to use only the latter. Because next_cpu
finds the next eligible cpu _after_ the given one, the iteration
variable has to be initialized to -1 and next_cpu has to be
called with this value before the first iteration. An x86_64
defconfig kernel (from sched/latest) is about 2500 bytes smaller
with this patch applied:

   text	   data	    bss	    dec	    hex	filename
6222517	 917952	 749932	7890401	 7865e1	vmlinux.orig
6219922	 917952	 749932	7887806	 785bbe	vmlinux

The same size reduction is seen for defconfig+MAXSMP

   text	   data	    bss	    dec	    hex	filename
6241772	2563968	1492716	10298456	 9d2458	vmlinux.orig
6239211	2563968	1492716	10295895	 9d1a57	vmlinux

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---

Hello Mike, Ingo,

Here is a version of the for_each_cpu_mask change that applies on
top of sched/latest. I think it is non-intrusive enough to put
it in sched/latest?

This version reuses the already-existing api next_cpu instead
of inventing a new one; initializing the iteration counter to
-1 was suggested by Matthew Wilcox.

Greetings,
	Alexander

 include/linux/cpumask.h |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 73434e5..74b748b 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -377,10 +377,10 @@ int __any_online_cpu(const cpumask_t *mask);
 #define first_cpu(src)		__first_cpu(&(src))
 #define next_cpu(n, src)	__next_cpu((n), &(src))
 #define any_online_cpu(mask) __any_online_cpu(&(mask))
-#define for_each_cpu_mask(cpu, mask)		\
-	for ((cpu) = first_cpu(mask);		\
-		(cpu) < NR_CPUS;		\
-		(cpu) = next_cpu((cpu), (mask)))
+#define for_each_cpu_mask(cpu, mask)			\
+	for ((cpu) = ~((typeof(cpu))0);			\
+		(cpu) = next_cpu((cpu), (mask)),	\
+		(cpu) < NR_CPUS; )
 #endif
 
 #if NR_CPUS <= 64
@@ -394,10 +394,10 @@ int __any_online_cpu(const cpumask_t *mask);
 int __next_cpu_nr(int n, const cpumask_t *srcp);
 #define next_cpu_nr(n, src)	__next_cpu_nr((n), &(src))
 #define cpus_weight_nr(cpumask)	__cpus_weight(&(cpumask), nr_cpu_ids)
-#define for_each_cpu_mask_nr(cpu, mask)		\
-	for ((cpu) = first_cpu(mask);		\
-		(cpu) < nr_cpu_ids;		\
-		(cpu) = next_cpu_nr((cpu), (mask)))
+#define for_each_cpu_mask_nr(cpu, mask)			\
+	for ((cpu) = ~((typeof(cpu))0);			\
+		(cpu) = next_cpu_nr((cpu), (mask)),	\
+		(cpu) < nr_cpu_ids; )
 
 #endif /* NR_CPUS > 64 */
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv2] Make for_each_cpu_mask a bit smaller
  2008-05-12 19:00           ` [PATCHv2] Make for_each_cpu_mask a bit smaller Alexander van Heukelum
@ 2008-05-12 21:45             ` Andreas Schwab
  2008-05-13  9:28               ` [PATCHv3] " Alexander van Heukelum
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Schwab @ 2008-05-12 21:45 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Mike Travis, Ingo Molnar, Andrew Morton, Paul Jackson,
	Thomas Gleixner, Matthew Wilcox, ARCH, LKML,
	Alexander van Heukelum

Alexander van Heukelum <heukelum@mailshack.com> writes:

> +#define for_each_cpu_mask(cpu, mask)			\
> +	for ((cpu) = ~((typeof(cpu))0);			\

There is no need for such a complicated expression, -1 will work for
every (arithmetic) type.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv3] Make for_each_cpu_mask a bit smaller
  2008-05-12 21:45             ` Andreas Schwab
@ 2008-05-13  9:28               ` Alexander van Heukelum
  2008-05-13 12:02                 ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander van Heukelum @ 2008-05-13  9:28 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Mike Travis, Ingo Molnar, Andrew Morton, Paul Jackson,
	Thomas Gleixner, Matthew Wilcox, ARCH, LKML,
	Alexander van Heukelum

The for_each_cpu_mask loop is used quite often in the kernel. It
makes use of two functions: first_cpu and next_cpu. This patch
changes for_each_cpu_mask to use only the latter. Because next_cpu
finds the next eligible cpu _after_ the given one, the iteration
variable has to be initialized to -1 and next_cpu has to be
called with this value before the first iteration. An x86_64
defconfig kernel (from sched/latest) is about 2500 bytes smaller
with this patch applied:

   text	   data	    bss	    dec	    hex	filename
6222517	 917952	 749932	7890401	 7865e1	vmlinux.orig
6219922	 917952	 749932	7887806	 785bbe	vmlinux

The same size reduction is seen for defconfig+MAXSMP

   text	   data	    bss	    dec	    hex	filename
6241772	2563968	1492716	10298456	 9d2458	vmlinux.orig
6239211	2563968	1492716	10295895	 9d1a57	vmlinux

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---

On Mon, May 12, 2008, Andreas Schwab wrote:
> > +#define for_each_cpu_mask(cpu, mask)			\
> > +	for ((cpu) = ~((typeof(cpu))0);			\
> 
> There is no need for such a complicated expression, -1 will work for
> every (arithmetic) type.

Indeed, thanks.

This version applies on top of sched/latest.

This version reuses the already-existing api next_cpu instead
of inventing a new one; initializing the iteration counter to
-1 was suggested by Matthew Wilcox. Now with a -1 instead of
an overly carefull ~((typeof(cpu))0). "-1" is properly sign-
extended even if cpu is u64 in a 32-bit environment.

Greetings,
	Alexander

 include/linux/cpumask.h |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 73434e5..c24a556 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -377,10 +377,10 @@ int __any_online_cpu(const cpumask_t *mask);
 #define first_cpu(src)		__first_cpu(&(src))
 #define next_cpu(n, src)	__next_cpu((n), &(src))
 #define any_online_cpu(mask) __any_online_cpu(&(mask))
-#define for_each_cpu_mask(cpu, mask)		\
-	for ((cpu) = first_cpu(mask);		\
-		(cpu) < NR_CPUS;		\
-		(cpu) = next_cpu((cpu), (mask)))
+#define for_each_cpu_mask(cpu, mask)			\
+	for ((cpu) = -1;				\
+		(cpu) = next_cpu((cpu), (mask)),	\
+		(cpu) < NR_CPUS; )
 #endif
 
 #if NR_CPUS <= 64
@@ -394,10 +394,10 @@ int __any_online_cpu(const cpumask_t *mask);
 int __next_cpu_nr(int n, const cpumask_t *srcp);
 #define next_cpu_nr(n, src)	__next_cpu_nr((n), &(src))
 #define cpus_weight_nr(cpumask)	__cpus_weight(&(cpumask), nr_cpu_ids)
-#define for_each_cpu_mask_nr(cpu, mask)		\
-	for ((cpu) = first_cpu(mask);		\
-		(cpu) < nr_cpu_ids;		\
-		(cpu) = next_cpu_nr((cpu), (mask)))
+#define for_each_cpu_mask_nr(cpu, mask)			\
+	for ((cpu) = -1;				\
+		(cpu) = next_cpu_nr((cpu), (mask)),	\
+		(cpu) < nr_cpu_ids; )
 
 #endif /* NR_CPUS > 64 */
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv3] Make for_each_cpu_mask a bit smaller
  2008-05-13  9:28               ` [PATCHv3] " Alexander van Heukelum
@ 2008-05-13 12:02                 ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2008-05-13 12:02 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Andreas Schwab, Mike Travis, Andrew Morton, Paul Jackson,
	Thomas Gleixner, Matthew Wilcox, ARCH, LKML,
	Alexander van Heukelum


* Alexander van Heukelum <heukelum@mailshack.com> wrote:

> The for_each_cpu_mask loop is used quite often in the kernel. It makes 
> use of two functions: first_cpu and next_cpu. This patch changes 
> for_each_cpu_mask to use only the latter. Because next_cpu finds the 
> next eligible cpu _after_ the given one, the iteration variable has to 
> be initialized to -1 and next_cpu has to be called with this value 
> before the first iteration. An x86_64 defconfig kernel (from 
> sched/latest) is about 2500 bytes smaller with this patch applied:
> 
>    text	   data	    bss	    dec	    hex	filename
> 6222517	 917952	 749932	7890401	 7865e1	vmlinux.orig
> 6219922	 917952	 749932	7887806	 785bbe	vmlinux
> 
> The same size reduction is seen for defconfig+MAXSMP
> 
>    text	   data	    bss	    dec	    hex	filename
> 6241772	2563968	1492716	10298456	 9d2458	vmlinux.orig
> 6239211	2563968	1492716	10295895	 9d1a57	vmlinux

applied for testing, thanks Alexander.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-05-13 12:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-11 13:50 [PATCH] Make for_each_cpu_mask a bit smaller Alexander van Heukelum
2008-05-11 13:57 ` Paul Jackson
2008-05-11 14:14 ` Paul Jackson
2008-05-11 16:06   ` [RFC/PATCH] Make for_each_node_mask out-of-line Alexander van Heukelum
2008-05-11 21:01     ` Paul Jackson
2008-05-12 12:04       ` Alexander van Heukelum
2008-05-12 16:45         ` Mike Travis
2008-05-12 19:00           ` [PATCHv2] Make for_each_cpu_mask a bit smaller Alexander van Heukelum
2008-05-12 21:45             ` Andreas Schwab
2008-05-13  9:28               ` [PATCHv3] " Alexander van Heukelum
2008-05-13 12:02                 ` Ingo Molnar
2008-05-11 15:24 ` [PATCH] " Matthew Wilcox
2008-05-11 16:19   ` Alexander van Heukelum
2008-05-11 22:01     ` Matthew Wilcox
2008-05-12 11:04       ` Alexander van Heukelum
2008-05-12 11:56         ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).