All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] kernel.h further split
@ 2021-10-07  9:51 Andy Shevchenko
  2021-10-07  9:51 ` [PATCH v2 1/4] kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers Andy Shevchenko
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07  9:51 UTC (permalink / raw)
  To: Andy Shevchenko, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	linux-kernel, linux-kselftest, kunit-dev, linux-media, netdev
  Cc: Brendan Higgins, Greg Kroah-Hartman, Rafael J. Wysocki,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Sakari Ailus,
	Laurent Pinchart, Mauro Carvalho Chehab, Thomas Graf, Herbert Xu,
	Andrew Morton

The kernel.h is a set of something which is not related to each other
and often used in non-crossed compilation units, especially when drivers
need only one or two macro definitions from it.

Here is the split of container_of(). The goals are the following:
- untwist the dependency hell a bit
- drop kernel.h inclusion where it's only used for container_of()
- speed up C preprocessing.

People, like Greg KH and Miguel Ojeda, were asking about the latter.
Read below the methodology and test setup with outcome numbers.

The methodology
===============
The question here is how to measure in the more or less clean way
the C preprocessing time when building a project like Linux kernel.
To answer it, let's look around and see what tools do we have that
may help. Aha, here is ccache tool that seems quite plausible to
be used. Its core idea is to preprocess C file, count hash (MD4)
and compare to ones that are in the cache. If found, return the
object file, avoiding compilation stage.

Taking into account the property of the ccache, configure and use
it in the below steps:

1. Configure kernel with allyesconfig

2. Make it with `make` to be sure that the cache is filled with
   the latest data. I.o.w. warm up the cache.

3. Run `make -s` (silent mode to reduce the influence of
   the unrelated things, like console output) 10 times and
   measure 'real' time spent.

4. Repeat 1-3 for each patch or patch set to get data sets before
   and after.

When we get the raw data, calculating median will show us the number.
Comparing them before and after we will see the difference.

The setup
=========
I have used the Intel x86_64 server platform (see partial output of
 `lscpu` below):

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  88
  On-line CPU(s) list:   0-87
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  22
    Socket(s):           2
    Stepping:            1
    CPU max MHz:         3600.0000
    CPU min MHz:         1200.0000
...
Caches (sum of all):
  L1d:                   1.4 MiB (44 instances)
  L1i:                   1.4 MiB (44 instances)
  L2:                    11 MiB (44 instances)
  L3:                    110 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-21,44-65
  NUMA node1 CPU(s):     22-43,66-87
Vulnerabilities:
  Itlb multihit:         KVM: Mitigation: Split huge pages
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
  Tsx async abort:       Mitigation; Clear CPU buffers; SMT vulnerable

With the following GCC:

$ gcc --version
gcc (Debian 10.3.0-11) 10.3.0

The commands I have run during the measurement were:

	rm -rf $O
	make O=$O allyesconfig
	time make O=$O -s -j64	# this step has been measured

The raw data and median
=======================
Before patch 2 (yes, I have measured the only patch 2 effect) in the series
(the data is sorted by time):

real    2m8.794s
real    2m11.183s
real    2m11.235s
real    2m11.639s
real    2m11.960s
real    2m12.014s
real    2m12.609s
real    2m13.177s
real    2m13.462s
real    2m19.132s

After patch 2 has been applied:

real    2m8.536s
real    2m8.776s
real    2m9.071s
real    2m9.459s
real    2m9.531s
real    2m9.610s
real    2m10.356s
real    2m10.430s
real    2m11.117s
real    2m11.885s

Median values are:
	131.987s before
	129.571s after

We see the steady speedup as of 1.83%.

Andy Shevchenko (4):
  kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers
  kernel.h: Split out container_of() and typeof_member() macros
  lib/rhashtable: Replace kernel.h with the necessary inclusions
  kunit: Replace kernel.h with the necessary inclusions

 include/kunit/test.h         | 14 ++++++++++++--
 include/linux/container_of.h | 37 ++++++++++++++++++++++++++++++++++++
 include/linux/kernel.h       | 31 +-----------------------------
 include/linux/kobject.h      |  1 +
 include/linux/list.h         |  6 ++++--
 include/linux/llist.h        |  4 +++-
 include/linux/plist.h        |  5 ++++-
 include/linux/rwsem.h        |  1 -
 include/linux/spinlock.h     |  1 -
 include/media/media-entity.h |  3 ++-
 lib/radix-tree.c             |  6 +++++-
 lib/rhashtable.c             |  7 ++++++-
 12 files changed, 75 insertions(+), 41 deletions(-)
 create mode 100644 include/linux/container_of.h

-- 
2.33.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/4] kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers
  2021-10-07  9:51 [PATCH v2 0/4] kernel.h further split Andy Shevchenko
@ 2021-10-07  9:51 ` Andy Shevchenko
  2021-10-07  9:51 ` [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros Andy Shevchenko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07  9:51 UTC (permalink / raw)
  To: Andy Shevchenko, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	linux-kernel, linux-kselftest, kunit-dev, linux-media, netdev
  Cc: Brendan Higgins, Greg Kroah-Hartman, Rafael J. Wysocki,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Sakari Ailus,
	Laurent Pinchart, Mauro Carvalho Chehab, Thomas Graf, Herbert Xu,
	Andrew Morton

There is no evidence we need kernel.h inclusion in certain headers.
Drop unneeded <linux/kernel.h> inclusion from other headers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 include/linux/rwsem.h    | 1 -
 include/linux/spinlock.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 352c6127cb90..f9348769e558 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -11,7 +11,6 @@
 #include <linux/linkage.h>
 
 #include <linux/types.h>
-#include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 76a855b3ecde..c04e99edfe92 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -57,7 +57,6 @@
 #include <linux/compiler.h>
 #include <linux/irqflags.h>
 #include <linux/thread_info.h>
-#include <linux/kernel.h>
 #include <linux/stringify.h>
 #include <linux/bottom_half.h>
 #include <linux/lockdep.h>
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros
  2021-10-07  9:51 [PATCH v2 0/4] kernel.h further split Andy Shevchenko
  2021-10-07  9:51 ` [PATCH v2 1/4] kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers Andy Shevchenko
@ 2021-10-07  9:51 ` Andy Shevchenko
  2021-10-07 10:37   ` Greg Kroah-Hartman
  2021-10-07 10:38   ` Greg Kroah-Hartman
  2021-10-07  9:51 ` [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions Andy Shevchenko
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07  9:51 UTC (permalink / raw)
  To: Andy Shevchenko, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	linux-kernel, linux-kselftest, kunit-dev, linux-media, netdev
  Cc: Brendan Higgins, Greg Kroah-Hartman, Rafael J. Wysocki,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Sakari Ailus,
	Laurent Pinchart, Mauro Carvalho Chehab, Thomas Graf, Herbert Xu,
	Andrew Morton

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt cleaning it up by splitting out container_of() and
typeof_member() macros.

At the same time convert users in the header and other folders to use it.
Though for time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

Note, there are _a lot_ of headers and modules that include kernel.h solely
for one of these macros and this allows to unburden compiler for the twisted
inclusion paths and to make new code cleaner in the future.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 include/kunit/test.h         |  2 ++
 include/linux/container_of.h | 37 ++++++++++++++++++++++++++++++++++++
 include/linux/kernel.h       | 31 +-----------------------------
 include/linux/kobject.h      |  1 +
 include/linux/list.h         |  6 ++++--
 include/linux/llist.h        |  4 +++-
 include/linux/plist.h        |  5 ++++-
 include/media/media-entity.h |  3 ++-
 lib/radix-tree.c             |  6 +++++-
 lib/rhashtable.c             |  1 +
 10 files changed, 60 insertions(+), 36 deletions(-)
 create mode 100644 include/linux/container_of.h

diff --git a/include/kunit/test.h b/include/kunit/test.h
index 24b40e5c160b..4d498f496790 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -11,6 +11,8 @@
 
 #include <kunit/assert.h>
 #include <kunit/try-catch.h>
+
+#include <linux/container_of.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/slab.h>
diff --git a/include/linux/container_of.h b/include/linux/container_of.h
new file mode 100644
index 000000000000..f6ee1be0e784
--- /dev/null
+++ b/include/linux/container_of.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CONTAINER_OF_H
+#define _LINUX_CONTAINER_OF_H
+
+#define typeof_member(T, m)	typeof(((T*)0)->m)
+
+/**
+ * container_of - cast a member of a structure out to the containing structure
+ * @ptr:	the pointer to the member.
+ * @type:	the type of the container struct this is embedded in.
+ * @member:	the name of the member within the struct.
+ *
+ */
+#define container_of(ptr, type, member) ({				\
+	void *__mptr = (void *)(ptr);					\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type *)(__mptr - offsetof(type, member))); })
+
+/**
+ * container_of_safe - cast a member of a structure out to the containing structure
+ * @ptr:	the pointer to the member.
+ * @type:	the type of the container struct this is embedded in.
+ * @member:	the name of the member within the struct.
+ *
+ * If IS_ERR_OR_NULL(ptr), ptr is returned unchanged.
+ */
+#define container_of_safe(ptr, type, member) ({				\
+	void *__mptr = (void *)(ptr);					\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) :			\
+		((type *)(__mptr - offsetof(type, member))); })
+
+#endif	/* _LINUX_CONTAINER_OF_H */
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d416fe3165cb..ad9fdcce9dcf 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -9,6 +9,7 @@
 #include <linux/stddef.h>
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/container_of.h>
 #include <linux/bitops.h>
 #include <linux/kstrtox.h>
 #include <linux/log2.h>
@@ -482,36 +483,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
 #define __CONCAT(a, b) a ## b
 #define CONCATENATE(a, b) __CONCAT(a, b)
 
-/**
- * container_of - cast a member of a structure out to the containing structure
- * @ptr:	the pointer to the member.
- * @type:	the type of the container struct this is embedded in.
- * @member:	the name of the member within the struct.
- *
- */
-#define container_of(ptr, type, member) ({				\
-	void *__mptr = (void *)(ptr);					\
-	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
-			 !__same_type(*(ptr), void),			\
-			 "pointer type mismatch in container_of()");	\
-	((type *)(__mptr - offsetof(type, member))); })
-
-/**
- * container_of_safe - cast a member of a structure out to the containing structure
- * @ptr:	the pointer to the member.
- * @type:	the type of the container struct this is embedded in.
- * @member:	the name of the member within the struct.
- *
- * If IS_ERR_OR_NULL(ptr), ptr is returned unchanged.
- */
-#define container_of_safe(ptr, type, member) ({				\
-	void *__mptr = (void *)(ptr);					\
-	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
-			 !__same_type(*(ptr), void),			\
-			 "pointer type mismatch in container_of()");	\
-	IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) :			\
-		((type *)(__mptr - offsetof(type, member))); })
-
 /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
 # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index efd56f990a46..bf8371e58b17 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -15,6 +15,7 @@
 #ifndef _KOBJECT_H_
 #define _KOBJECT_H_
 
+#include <linux/container_of.h>
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/sysfs.h>
diff --git a/include/linux/list.h b/include/linux/list.h
index f2af4b4aa4e9..5dc679b373da 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -2,11 +2,13 @@
 #ifndef _LINUX_LIST_H
 #define _LINUX_LIST_H
 
+#include <linux/container_of.h>
+#include <linux/const.h>
 #include <linux/types.h>
 #include <linux/stddef.h>
 #include <linux/poison.h>
-#include <linux/const.h>
-#include <linux/kernel.h>
+
+#include <asm/barrier.h>
 
 /*
  * Circular doubly linked list implementation.
diff --git a/include/linux/llist.h b/include/linux/llist.h
index 24f207b0190b..85bda2d02d65 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -49,7 +49,9 @@
  */
 
 #include <linux/atomic.h>
-#include <linux/kernel.h>
+#include <linux/container_of.h>
+#include <linux/stddef.h>
+#include <linux/types.h>
 
 struct llist_head {
 	struct llist_node *first;
diff --git a/include/linux/plist.h b/include/linux/plist.h
index 66bab1bca35c..0f352c1d3c80 100644
--- a/include/linux/plist.h
+++ b/include/linux/plist.h
@@ -73,8 +73,11 @@
 #ifndef _LINUX_PLIST_H_
 #define _LINUX_PLIST_H_
 
-#include <linux/kernel.h>
+#include <linux/container_of.h>
 #include <linux/list.h>
+#include <linux/types.h>
+
+#include <asm/bug.h>
 
 struct plist_head {
 	struct list_head node_list;
diff --git a/include/media/media-entity.h b/include/media/media-entity.h
index 09737b47881f..fea489f03d57 100644
--- a/include/media/media-entity.h
+++ b/include/media/media-entity.h
@@ -13,10 +13,11 @@
 
 #include <linux/bitmap.h>
 #include <linux/bug.h>
+#include <linux/container_of.h>
 #include <linux/fwnode.h>
-#include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/media.h>
+#include <linux/types.h>
 
 /* Enums used internally at the media controller to represent graphs */
 
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index b3afafe46fff..a0f346a095df 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -12,19 +12,21 @@
 #include <linux/bitmap.h>
 #include <linux/bitops.h>
 #include <linux/bug.h>
+#include <linux/container_of.h>
 #include <linux/cpu.h>
 #include <linux/errno.h>
 #include <linux/export.h>
 #include <linux/idr.h>
 #include <linux/init.h>
-#include <linux/kernel.h>
 #include <linux/kmemleak.h>
+#include <linux/math.h>
 #include <linux/percpu.h>
 #include <linux/preempt.h>		/* in_interrupt() */
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/types.h>
 #include <linux/xarray.h>
 
 /*
@@ -285,6 +287,8 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
 	return ret;
 }
 
+extern void radix_tree_node_rcu_free(struct rcu_head *head);
+
 void radix_tree_node_rcu_free(struct rcu_head *head)
 {
 	struct radix_tree_node *node =
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index e12bbfb240b8..a422c7dd9126 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -12,6 +12,7 @@
  */
 
 #include <linux/atomic.h>
+#include <linux/container_of.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/log2.h>
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions
  2021-10-07  9:51 [PATCH v2 0/4] kernel.h further split Andy Shevchenko
  2021-10-07  9:51 ` [PATCH v2 1/4] kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers Andy Shevchenko
  2021-10-07  9:51 ` [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros Andy Shevchenko
@ 2021-10-07  9:51 ` Andy Shevchenko
  2021-10-07 11:23   ` Herbert Xu
  2021-10-07  9:51 ` [PATCH v2 4/4] kunit: " Andy Shevchenko
  2021-10-07 10:33 ` [PATCH v2 0/4] kernel.h further split Greg Kroah-Hartman
  4 siblings, 1 reply; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07  9:51 UTC (permalink / raw)
  To: Andy Shevchenko, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	linux-kernel, linux-kselftest, kunit-dev, linux-media, netdev
  Cc: Brendan Higgins, Greg Kroah-Hartman, Rafael J. Wysocki,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Sakari Ailus,
	Laurent Pinchart, Mauro Carvalho Chehab, Thomas Graf, Herbert Xu,
	Andrew Morton

When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.

Replace kernel.h inclusion with the list of what is really being used.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 lib/rhashtable.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index a422c7dd9126..01502cf77564 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -12,9 +12,13 @@
  */
 
 #include <linux/atomic.h>
+#include <linux/bit_spinlock.h>
 #include <linux/container_of.h>
-#include <linux/kernel.h>
+#include <linux/err.h>
+#include <linux/export.h>
 #include <linux/init.h>
+#include <linux/jhash.h>
+#include <linux/lockdep.h>
 #include <linux/log2.h>
 #include <linux/sched.h>
 #include <linux/rculist.h>
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/4] kunit: Replace kernel.h with the necessary inclusions
  2021-10-07  9:51 [PATCH v2 0/4] kernel.h further split Andy Shevchenko
                   ` (2 preceding siblings ...)
  2021-10-07  9:51 ` [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions Andy Shevchenko
@ 2021-10-07  9:51 ` Andy Shevchenko
  2021-10-07 10:33 ` [PATCH v2 0/4] kernel.h further split Greg Kroah-Hartman
  4 siblings, 0 replies; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07  9:51 UTC (permalink / raw)
  To: Andy Shevchenko, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	linux-kernel, linux-kselftest, kunit-dev, linux-media, netdev
  Cc: Brendan Higgins, Greg Kroah-Hartman, Rafael J. Wysocki,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Sakari Ailus,
	Laurent Pinchart, Mauro Carvalho Chehab, Thomas Graf, Herbert Xu,
	Andrew Morton

When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.

Replace kernel.h inclusion with the list of what is really being used.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 include/kunit/test.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/kunit/test.h b/include/kunit/test.h
index 4d498f496790..d88d9f7ead0a 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -12,12 +12,20 @@
 #include <kunit/assert.h>
 #include <kunit/try-catch.h>
 
+#include <linux/compiler_attributes.h>
 #include <linux/container_of.h>
-#include <linux/kernel.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kconfig.h>
+#include <linux/kref.h>
+#include <linux/list.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
 #include <linux/types.h>
-#include <linux/kref.h>
+
+#include <asm/rwonce.h>
 
 struct kunit_resource;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] kernel.h further split
  2021-10-07  9:51 [PATCH v2 0/4] kernel.h further split Andy Shevchenko
                   ` (3 preceding siblings ...)
  2021-10-07  9:51 ` [PATCH v2 4/4] kunit: " Andy Shevchenko
@ 2021-10-07 10:33 ` Greg Kroah-Hartman
  2021-10-07 11:51   ` Andy Shevchenko
  4 siblings, 1 reply; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-07 10:33 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Zijlstra, Thomas Gleixner, Randy Dunlap, linux-kernel,
	linux-kselftest, kunit-dev, linux-media, netdev, Brendan Higgins,
	Rafael J. Wysocki, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:
> The kernel.h is a set of something which is not related to each other
> and often used in non-crossed compilation units, especially when drivers
> need only one or two macro definitions from it.
> 
> Here is the split of container_of(). The goals are the following:
> - untwist the dependency hell a bit
> - drop kernel.h inclusion where it's only used for container_of()
> - speed up C preprocessing.
> 
> People, like Greg KH and Miguel Ojeda, were asking about the latter.
> Read below the methodology and test setup with outcome numbers.
> 
> The methodology
> ===============
> The question here is how to measure in the more or less clean way
> the C preprocessing time when building a project like Linux kernel.
> To answer it, let's look around and see what tools do we have that
> may help. Aha, here is ccache tool that seems quite plausible to
> be used. Its core idea is to preprocess C file, count hash (MD4)
> and compare to ones that are in the cache. If found, return the
> object file, avoiding compilation stage.
> 
> Taking into account the property of the ccache, configure and use
> it in the below steps:
> 
> 1. Configure kernel with allyesconfig
> 
> 2. Make it with `make` to be sure that the cache is filled with
>    the latest data. I.o.w. warm up the cache.
> 
> 3. Run `make -s` (silent mode to reduce the influence of
>    the unrelated things, like console output) 10 times and
>    measure 'real' time spent.
> 
> 4. Repeat 1-3 for each patch or patch set to get data sets before
>    and after.
> 
> When we get the raw data, calculating median will show us the number.
> Comparing them before and after we will see the difference.
> 
> The setup
> =========
> I have used the Intel x86_64 server platform (see partial output of
>  `lscpu` below):
> 
> $ lscpu
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         46 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  88
>   On-line CPU(s) list:   0-87
> Vendor ID:               GenuineIntel
>   Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>     CPU family:          6
>     Model:               79
>     Thread(s) per core:  2
>     Core(s) per socket:  22
>     Socket(s):           2
>     Stepping:            1
>     CPU max MHz:         3600.0000
>     CPU min MHz:         1200.0000
> ...
> Caches (sum of all):
>   L1d:                   1.4 MiB (44 instances)
>   L1i:                   1.4 MiB (44 instances)
>   L2:                    11 MiB (44 instances)
>   L3:                    110 MiB (2 instances)
> NUMA:
>   NUMA node(s):          2
>   NUMA node0 CPU(s):     0-21,44-65
>   NUMA node1 CPU(s):     22-43,66-87
> Vulnerabilities:
>   Itlb multihit:         KVM: Mitigation: Split huge pages
>   L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
>   Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
>   Meltdown:              Mitigation; PTI
>   Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
>   Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
>   Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
>   Tsx async abort:       Mitigation; Clear CPU buffers; SMT vulnerable
> 
> With the following GCC:
> 
> $ gcc --version
> gcc (Debian 10.3.0-11) 10.3.0
> 
> The commands I have run during the measurement were:
> 
> 	rm -rf $O
> 	make O=$O allyesconfig
> 	time make O=$O -s -j64	# this step has been measured
> 
> The raw data and median
> =======================
> Before patch 2 (yes, I have measured the only patch 2 effect) in the series
> (the data is sorted by time):
> 
> real    2m8.794s
> real    2m11.183s
> real    2m11.235s
> real    2m11.639s
> real    2m11.960s
> real    2m12.014s
> real    2m12.609s
> real    2m13.177s
> real    2m13.462s
> real    2m19.132s
> 
> After patch 2 has been applied:
> 
> real    2m8.536s
> real    2m8.776s
> real    2m9.071s
> real    2m9.459s
> real    2m9.531s
> real    2m9.610s
> real    2m10.356s
> real    2m10.430s
> real    2m11.117s
> real    2m11.885s
> 
> Median values are:
> 	131.987s before
> 	129.571s after
> 
> We see the steady speedup as of 1.83%.

You do know about kcbench:
	https://gitlab.com/knurd42/kcbench.git

Try running that to make it such that we know how it was tested :)

thanks,

greg k-h


> 
> Andy Shevchenko (4):
>   kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers
>   kernel.h: Split out container_of() and typeof_member() macros
>   lib/rhashtable: Replace kernel.h with the necessary inclusions
>   kunit: Replace kernel.h with the necessary inclusions
> 
>  include/kunit/test.h         | 14 ++++++++++++--
>  include/linux/container_of.h | 37 ++++++++++++++++++++++++++++++++++++
>  include/linux/kernel.h       | 31 +-----------------------------
>  include/linux/kobject.h      |  1 +
>  include/linux/list.h         |  6 ++++--
>  include/linux/llist.h        |  4 +++-
>  include/linux/plist.h        |  5 ++++-
>  include/linux/rwsem.h        |  1 -
>  include/linux/spinlock.h     |  1 -
>  include/media/media-entity.h |  3 ++-
>  lib/radix-tree.c             |  6 +++++-
>  lib/rhashtable.c             |  7 ++++++-
>  12 files changed, 75 insertions(+), 41 deletions(-)
>  create mode 100644 include/linux/container_of.h
> 
> -- 
> 2.33.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros
  2021-10-07  9:51 ` [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros Andy Shevchenko
@ 2021-10-07 10:37   ` Greg Kroah-Hartman
  2021-10-07 10:38   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-07 10:37 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Zijlstra, Thomas Gleixner, Randy Dunlap, linux-kernel,
	linux-kselftest, kunit-dev, linux-media, netdev, Brendan Higgins,
	Rafael J. Wysocki, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Thu, Oct 07, 2021 at 12:51:27PM +0300, Andy Shevchenko wrote:
> kernel.h is being used as a dump for all kinds of stuff for a long time.
> Here is the attempt cleaning it up by splitting out container_of() and
> typeof_member() macros.
> 
> At the same time convert users in the header and other folders to use it.
> Though for time being include new header back to kernel.h to avoid twisted
> indirected includes for existing users.
> 
> Note, there are _a lot_ of headers and modules that include kernel.h solely
> for one of these macros and this allows to unburden compiler for the twisted
> inclusion paths and to make new code cleaner in the future.
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  include/kunit/test.h         |  2 ++
>  include/linux/container_of.h | 37 ++++++++++++++++++++++++++++++++++++
>  include/linux/kernel.h       | 31 +-----------------------------
>  include/linux/kobject.h      |  1 +
>  include/linux/list.h         |  6 ++++--
>  include/linux/llist.h        |  4 +++-
>  include/linux/plist.h        |  5 ++++-
>  include/media/media-entity.h |  3 ++-
>  lib/radix-tree.c             |  6 +++++-
>  lib/rhashtable.c             |  1 +
>  10 files changed, 60 insertions(+), 36 deletions(-)
>  create mode 100644 include/linux/container_of.h
> 
> diff --git a/include/kunit/test.h b/include/kunit/test.h
> index 24b40e5c160b..4d498f496790 100644
> --- a/include/kunit/test.h
> +++ b/include/kunit/test.h
> @@ -11,6 +11,8 @@
>  
>  #include <kunit/assert.h>
>  #include <kunit/try-catch.h>
> +
> +#include <linux/container_of.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
> diff --git a/include/linux/container_of.h b/include/linux/container_of.h
> new file mode 100644
> index 000000000000..f6ee1be0e784
> --- /dev/null
> +++ b/include/linux/container_of.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_CONTAINER_OF_H
> +#define _LINUX_CONTAINER_OF_H
> +
> +#define typeof_member(T, m)	typeof(((T*)0)->m)
> +
> +/**
> + * container_of - cast a member of a structure out to the containing structure
> + * @ptr:	the pointer to the member.
> + * @type:	the type of the container struct this is embedded in.
> + * @member:	the name of the member within the struct.
> + *
> + */
> +#define container_of(ptr, type, member) ({				\
> +	void *__mptr = (void *)(ptr);					\
> +	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
> +			 !__same_type(*(ptr), void),			\
> +			 "pointer type mismatch in container_of()");	\
> +	((type *)(__mptr - offsetof(type, member))); })
> +
> +/**
> + * container_of_safe - cast a member of a structure out to the containing structure
> + * @ptr:	the pointer to the member.
> + * @type:	the type of the container struct this is embedded in.
> + * @member:	the name of the member within the struct.
> + *
> + * If IS_ERR_OR_NULL(ptr), ptr is returned unchanged.
> + */
> +#define container_of_safe(ptr, type, member) ({				\
> +	void *__mptr = (void *)(ptr);					\
> +	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
> +			 !__same_type(*(ptr), void),			\
> +			 "pointer type mismatch in container_of()");	\
> +	IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) :			\
> +		((type *)(__mptr - offsetof(type, member))); })
> +
> +#endif	/* _LINUX_CONTAINER_OF_H */
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d416fe3165cb..ad9fdcce9dcf 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -9,6 +9,7 @@
>  #include <linux/stddef.h>
>  #include <linux/types.h>
>  #include <linux/compiler.h>
> +#include <linux/container_of.h>
>  #include <linux/bitops.h>
>  #include <linux/kstrtox.h>
>  #include <linux/log2.h>
> @@ -482,36 +483,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
>  #define __CONCAT(a, b) a ## b
>  #define CONCATENATE(a, b) __CONCAT(a, b)
>  
> -/**
> - * container_of - cast a member of a structure out to the containing structure
> - * @ptr:	the pointer to the member.
> - * @type:	the type of the container struct this is embedded in.
> - * @member:	the name of the member within the struct.
> - *
> - */
> -#define container_of(ptr, type, member) ({				\
> -	void *__mptr = (void *)(ptr);					\
> -	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
> -			 !__same_type(*(ptr), void),			\
> -			 "pointer type mismatch in container_of()");	\
> -	((type *)(__mptr - offsetof(type, member))); })
> -
> -/**
> - * container_of_safe - cast a member of a structure out to the containing structure
> - * @ptr:	the pointer to the member.
> - * @type:	the type of the container struct this is embedded in.
> - * @member:	the name of the member within the struct.
> - *
> - * If IS_ERR_OR_NULL(ptr), ptr is returned unchanged.
> - */
> -#define container_of_safe(ptr, type, member) ({				\
> -	void *__mptr = (void *)(ptr);					\
> -	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
> -			 !__same_type(*(ptr), void),			\
> -			 "pointer type mismatch in container_of()");	\
> -	IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) :			\
> -		((type *)(__mptr - offsetof(type, member))); })
> -
>  /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
>  #ifdef CONFIG_FTRACE_MCOUNT_RECORD
>  # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index efd56f990a46..bf8371e58b17 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -15,6 +15,7 @@
>  #ifndef _KOBJECT_H_
>  #define _KOBJECT_H_
>  
> +#include <linux/container_of.h>
>  #include <linux/types.h>
>  #include <linux/list.h>
>  #include <linux/sysfs.h>
> diff --git a/include/linux/list.h b/include/linux/list.h
> index f2af4b4aa4e9..5dc679b373da 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -2,11 +2,13 @@
>  #ifndef _LINUX_LIST_H
>  #define _LINUX_LIST_H
>  
> +#include <linux/container_of.h>
> +#include <linux/const.h>
>  #include <linux/types.h>
>  #include <linux/stddef.h>
>  #include <linux/poison.h>
> -#include <linux/const.h>
> -#include <linux/kernel.h>
> +
> +#include <asm/barrier.h>
>  
>  /*
>   * Circular doubly linked list implementation.


This change looks odd.

You already have kernel.h including container_of.h, so why not have a
series that does:
	- create container_of.h and have kernel.h include it
	- multiple patches that remove kernel.h and use container_of.h
	  instead only.
	- multiple patches that remove kernel.h and use container_of.h
	  and other .h files (like list.h seems to need here.)
	- remove container_of.h from kernel.h

Mushing them all together here makes this really hard to understand why
this change is needed here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros
  2021-10-07  9:51 ` [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros Andy Shevchenko
  2021-10-07 10:37   ` Greg Kroah-Hartman
@ 2021-10-07 10:38   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-07 10:38 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Zijlstra, Thomas Gleixner, Randy Dunlap, linux-kernel,
	linux-kselftest, kunit-dev, linux-media, netdev, Brendan Higgins,
	Rafael J. Wysocki, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Thu, Oct 07, 2021 at 12:51:27PM +0300, Andy Shevchenko wrote:
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -12,19 +12,21 @@
>  #include <linux/bitmap.h>
>  #include <linux/bitops.h>
>  #include <linux/bug.h>
> +#include <linux/container_of.h>
>  #include <linux/cpu.h>
>  #include <linux/errno.h>
>  #include <linux/export.h>
>  #include <linux/idr.h>
>  #include <linux/init.h>
> -#include <linux/kernel.h>
>  #include <linux/kmemleak.h>
> +#include <linux/math.h>
>  #include <linux/percpu.h>
>  #include <linux/preempt.h>		/* in_interrupt() */
>  #include <linux/radix-tree.h>
>  #include <linux/rcupdate.h>
>  #include <linux/slab.h>
>  #include <linux/string.h>
> +#include <linux/types.h>
>  #include <linux/xarray.h>
>  
>  /*
> @@ -285,6 +287,8 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
>  	return ret;
>  }
>  
> +extern void radix_tree_node_rcu_free(struct rcu_head *head);

.c files should not need an extern, this belongs in a .h file somewhere,
or something really went wrong here...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions
  2021-10-07  9:51 ` [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions Andy Shevchenko
@ 2021-10-07 11:23   ` Herbert Xu
  2021-10-07 11:44     ` Andy Shevchenko
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2021-10-07 11:23 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Zijlstra, Thomas Gleixner, Randy Dunlap, linux-kernel,
	linux-kselftest, kunit-dev, linux-media, netdev, Brendan Higgins,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Andrew Morton

On Thu, Oct 07, 2021 at 12:51:28PM +0300, Andy Shevchenko wrote:
> When kernel.h is used in the headers it adds a lot into dependency hell,
> especially when there are circular dependencies are involved.
> 
> Replace kernel.h inclusion with the list of what is really being used.
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  lib/rhashtable.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index a422c7dd9126..01502cf77564 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -12,9 +12,13 @@
>   */
>  
>  #include <linux/atomic.h>
> +#include <linux/bit_spinlock.h>
>  #include <linux/container_of.h>
> -#include <linux/kernel.h>
> +#include <linux/err.h>
> +#include <linux/export.h>
>  #include <linux/init.h>
> +#include <linux/jhash.h>
> +#include <linux/lockdep.h>
>  #include <linux/log2.h>
>  #include <linux/sched.h>
>  #include <linux/rculist.h>

Nack.  I can see the benefits of splitting things out of kernel.h
but there is no reason to add crap to end users like rhashtable.c.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions
  2021-10-07 11:23   ` Herbert Xu
@ 2021-10-07 11:44     ` Andy Shevchenko
  2021-10-08  2:18       ` Herbert Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07 11:44 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Peter Zijlstra, Thomas Gleixner, Randy Dunlap, linux-kernel,
	linux-kselftest, kunit-dev, linux-media, netdev, Brendan Higgins,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Andrew Morton

On Thu, Oct 07, 2021 at 07:23:28PM +0800, Herbert Xu wrote:
> On Thu, Oct 07, 2021 at 12:51:28PM +0300, Andy Shevchenko wrote:
> > When kernel.h is used in the headers it adds a lot into dependency hell,
> > especially when there are circular dependencies are involved.

> > Replace kernel.h inclusion with the list of what is really being used.

> Nack.  I can see the benefits of splitting things out of kernel.h
> but there is no reason to add crap to end users like rhashtable.c.

Crap is in the kernel.h. Could you elaborate how making a proper list
of the inclusions is a crap?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] kernel.h further split
  2021-10-07 10:33 ` [PATCH v2 0/4] kernel.h further split Greg Kroah-Hartman
@ 2021-10-07 11:51   ` Andy Shevchenko
  2021-10-07 13:59     ` Greg Kroah-Hartman
  2021-10-08  9:37     ` Thorsten Leemhuis
  0 siblings, 2 replies; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07 11:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Thorsten Leemhuis
  Cc: Andy Shevchenko, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	KUnit Development, Linux Media Mailing List, netdev,
	Brendan Higgins, Rafael J. Wysocki, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:
> > The kernel.h is a set of something which is not related to each other
> > and often used in non-crossed compilation units, especially when drivers
> > need only one or two macro definitions from it.
> >
> > Here is the split of container_of(). The goals are the following:
> > - untwist the dependency hell a bit
> > - drop kernel.h inclusion where it's only used for container_of()
> > - speed up C preprocessing.
> >
> > People, like Greg KH and Miguel Ojeda, were asking about the latter.
> > Read below the methodology and test setup with outcome numbers.
> >
> > The methodology
> > ===============
> > The question here is how to measure in the more or less clean way
> > the C preprocessing time when building a project like Linux kernel.
> > To answer it, let's look around and see what tools do we have that
> > may help. Aha, here is ccache tool that seems quite plausible to
> > be used. Its core idea is to preprocess C file, count hash (MD4)
> > and compare to ones that are in the cache. If found, return the
> > object file, avoiding compilation stage.
> >
> > Taking into account the property of the ccache, configure and use
> > it in the below steps:
> >
> > 1. Configure kernel with allyesconfig
> >
> > 2. Make it with `make` to be sure that the cache is filled with
> >    the latest data. I.o.w. warm up the cache.
> >
> > 3. Run `make -s` (silent mode to reduce the influence of
> >    the unrelated things, like console output) 10 times and
> >    measure 'real' time spent.
> >
> > 4. Repeat 1-3 for each patch or patch set to get data sets before
> >    and after.
> >
> > When we get the raw data, calculating median will show us the number.
> > Comparing them before and after we will see the difference.
> >
> > The setup
> > =========
> > I have used the Intel x86_64 server platform (see partial output of
> >  `lscpu` below):
> >
> > $ lscpu
> > Architecture:            x86_64
> >   CPU op-mode(s):        32-bit, 64-bit
> >   Address sizes:         46 bits physical, 48 bits virtual
> >   Byte Order:            Little Endian
> > CPU(s):                  88
> >   On-line CPU(s) list:   0-87
> > Vendor ID:               GenuineIntel
> >   Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> >     CPU family:          6
> >     Model:               79
> >     Thread(s) per core:  2
> >     Core(s) per socket:  22
> >     Socket(s):           2
> >     Stepping:            1
> >     CPU max MHz:         3600.0000
> >     CPU min MHz:         1200.0000
> > ...
> > Caches (sum of all):
> >   L1d:                   1.4 MiB (44 instances)
> >   L1i:                   1.4 MiB (44 instances)
> >   L2:                    11 MiB (44 instances)
> >   L3:                    110 MiB (2 instances)
> > NUMA:
> >   NUMA node(s):          2
> >   NUMA node0 CPU(s):     0-21,44-65
> >   NUMA node1 CPU(s):     22-43,66-87
> > Vulnerabilities:
> >   Itlb multihit:         KVM: Mitigation: Split huge pages
> >   L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
> >   Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
> >   Meltdown:              Mitigation; PTI
> >   Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
> >   Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
> >   Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
> >   Tsx async abort:       Mitigation; Clear CPU buffers; SMT vulnerable
> >
> > With the following GCC:
> >
> > $ gcc --version
> > gcc (Debian 10.3.0-11) 10.3.0
> >
> > The commands I have run during the measurement were:
> >
> >       rm -rf $O
> >       make O=$O allyesconfig
> >       time make O=$O -s -j64  # this step has been measured
> >
> > The raw data and median
> > =======================
> > Before patch 2 (yes, I have measured the only patch 2 effect) in the series
> > (the data is sorted by time):
> >
> > real    2m8.794s
> > real    2m11.183s
> > real    2m11.235s
> > real    2m11.639s
> > real    2m11.960s
> > real    2m12.014s
> > real    2m12.609s
> > real    2m13.177s
> > real    2m13.462s
> > real    2m19.132s
> >
> > After patch 2 has been applied:
> >
> > real    2m8.536s
> > real    2m8.776s
> > real    2m9.071s
> > real    2m9.459s
> > real    2m9.531s
> > real    2m9.610s
> > real    2m10.356s
> > real    2m10.430s
> > real    2m11.117s
> > real    2m11.885s
> >
> > Median values are:
> >       131.987s before
> >       129.571s after
> >
> > We see the steady speedup as of 1.83%.
>
> You do know about kcbench:
>         https://gitlab.com/knurd42/kcbench.git
>
> Try running that to make it such that we know how it was tested :)

I'll try it.

Meanwhile, Thorsten, can you have a look at my approach and tell if it
makes sense?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] kernel.h further split
  2021-10-07 11:51   ` Andy Shevchenko
@ 2021-10-07 13:59     ` Greg Kroah-Hartman
  2021-10-07 14:47       ` Andy Shevchenko
  2021-10-08  9:37     ` Thorsten Leemhuis
  1 sibling, 1 reply; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-07 13:59 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Thorsten Leemhuis, Andy Shevchenko, Peter Zijlstra,
	Thomas Gleixner, Randy Dunlap, Linux Kernel Mailing List,
	open list:KERNEL SELFTEST FRAMEWORK, KUnit Development,
	Linux Media Mailing List, netdev, Brendan Higgins,
	Rafael J. Wysocki, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Thu, Oct 07, 2021 at 02:51:15PM +0300, Andy Shevchenko wrote:
> On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:
> > > The kernel.h is a set of something which is not related to each other
> > > and often used in non-crossed compilation units, especially when drivers
> > > need only one or two macro definitions from it.
> > >
> > > Here is the split of container_of(). The goals are the following:
> > > - untwist the dependency hell a bit
> > > - drop kernel.h inclusion where it's only used for container_of()
> > > - speed up C preprocessing.
> > >
> > > People, like Greg KH and Miguel Ojeda, were asking about the latter.
> > > Read below the methodology and test setup with outcome numbers.
> > >
> > > The methodology
> > > ===============
> > > The question here is how to measure in the more or less clean way
> > > the C preprocessing time when building a project like Linux kernel.
> > > To answer it, let's look around and see what tools do we have that
> > > may help. Aha, here is ccache tool that seems quite plausible to
> > > be used. Its core idea is to preprocess C file, count hash (MD4)
> > > and compare to ones that are in the cache. If found, return the
> > > object file, avoiding compilation stage.
> > >
> > > Taking into account the property of the ccache, configure and use
> > > it in the below steps:
> > >
> > > 1. Configure kernel with allyesconfig
> > >
> > > 2. Make it with `make` to be sure that the cache is filled with
> > >    the latest data. I.o.w. warm up the cache.
> > >
> > > 3. Run `make -s` (silent mode to reduce the influence of
> > >    the unrelated things, like console output) 10 times and
> > >    measure 'real' time spent.
> > >
> > > 4. Repeat 1-3 for each patch or patch set to get data sets before
> > >    and after.
> > >
> > > When we get the raw data, calculating median will show us the number.
> > > Comparing them before and after we will see the difference.
> > >
> > > The setup
> > > =========
> > > I have used the Intel x86_64 server platform (see partial output of
> > >  `lscpu` below):
> > >
> > > $ lscpu
> > > Architecture:            x86_64
> > >   CPU op-mode(s):        32-bit, 64-bit
> > >   Address sizes:         46 bits physical, 48 bits virtual
> > >   Byte Order:            Little Endian
> > > CPU(s):                  88
> > >   On-line CPU(s) list:   0-87
> > > Vendor ID:               GenuineIntel
> > >   Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > >     CPU family:          6
> > >     Model:               79
> > >     Thread(s) per core:  2
> > >     Core(s) per socket:  22
> > >     Socket(s):           2
> > >     Stepping:            1
> > >     CPU max MHz:         3600.0000
> > >     CPU min MHz:         1200.0000
> > > ...
> > > Caches (sum of all):
> > >   L1d:                   1.4 MiB (44 instances)
> > >   L1i:                   1.4 MiB (44 instances)
> > >   L2:                    11 MiB (44 instances)
> > >   L3:                    110 MiB (2 instances)
> > > NUMA:
> > >   NUMA node(s):          2
> > >   NUMA node0 CPU(s):     0-21,44-65
> > >   NUMA node1 CPU(s):     22-43,66-87
> > > Vulnerabilities:
> > >   Itlb multihit:         KVM: Mitigation: Split huge pages
> > >   L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
> > >   Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
> > >   Meltdown:              Mitigation; PTI
> > >   Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
> > >   Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
> > >   Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
> > >   Tsx async abort:       Mitigation; Clear CPU buffers; SMT vulnerable
> > >
> > > With the following GCC:
> > >
> > > $ gcc --version
> > > gcc (Debian 10.3.0-11) 10.3.0
> > >
> > > The commands I have run during the measurement were:
> > >
> > >       rm -rf $O
> > >       make O=$O allyesconfig
> > >       time make O=$O -s -j64  # this step has been measured
> > >
> > > The raw data and median
> > > =======================
> > > Before patch 2 (yes, I have measured the only patch 2 effect) in the series
> > > (the data is sorted by time):
> > >
> > > real    2m8.794s
> > > real    2m11.183s
> > > real    2m11.235s
> > > real    2m11.639s
> > > real    2m11.960s
> > > real    2m12.014s
> > > real    2m12.609s
> > > real    2m13.177s
> > > real    2m13.462s
> > > real    2m19.132s
> > >
> > > After patch 2 has been applied:
> > >
> > > real    2m8.536s
> > > real    2m8.776s
> > > real    2m9.071s
> > > real    2m9.459s
> > > real    2m9.531s
> > > real    2m9.610s
> > > real    2m10.356s
> > > real    2m10.430s
> > > real    2m11.117s
> > > real    2m11.885s
> > >
> > > Median values are:
> > >       131.987s before
> > >       129.571s after
> > >
> > > We see the steady speedup as of 1.83%.
> >
> > You do know about kcbench:
> >         https://gitlab.com/knurd42/kcbench.git
> >
> > Try running that to make it such that we know how it was tested :)
> 
> I'll try it.
> 
> Meanwhile, Thorsten, can you have a look at my approach and tell if it
> makes sense?

No, do not use ccache when trying to benchmark the speed of kernel
builds, that tests the speed of your disk subsystem...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] kernel.h further split
  2021-10-07 13:59     ` Greg Kroah-Hartman
@ 2021-10-07 14:47       ` Andy Shevchenko
  2021-10-07 15:07         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-07 14:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Thorsten Leemhuis, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	KUnit Development, Linux Media Mailing List, netdev,
	Brendan Higgins, Rafael J. Wysocki, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Thu, Oct 07, 2021 at 03:59:08PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Oct 07, 2021 at 02:51:15PM +0300, Andy Shevchenko wrote:
> > On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:

...

> > Meanwhile, Thorsten, can you have a look at my approach and tell if it
> > makes sense?
> 
> No, do not use ccache when trying to benchmark the speed of kernel
> builds, that tests the speed of your disk subsystem...

First rule of the measurement is to be sure WHAT we are measuring.
And I'm pretty much explained WHAT and HOW. On the other hand, the
kcbench can't answer to the question about C preprocessing speed
without help of ccache or something similar.

Measuring complete build is exactly not what we want because of
O(compilation) vs. o(C preprocessing) meaning that any fluctuation
in the former makes silly to measure anything from the latter.

You see, my theory is proved by practical experiment:

$ kcbench -i 3 -j 64 -o $O -s $PWD --no-download -m
Processor:           Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz [88 CPUs]
Cpufreq; Memory:     powersave [intel_pstate]; 128823 MiB
Linux running:       5.6.0-2-amd64 [x86_64]
Compiler:            gcc (Debian 10.3.0-11) 10.3.0
Linux compiled:      5.15.0-rc4
Config; Environment: allmodconfig; CCACHE_DISABLE="1"
Build command:       make vmlinux modules
Filling caches:      This might take a while... Done
Run 1 (-j 64):       464.07 seconds / 7.76 kernels/hour [P:6001%]
Run 2 (-j 64):       464.64 seconds / 7.75 kernels/hour [P:6000%]
Run 3 (-j 64):       486.41 seconds / 7.40 kernels/hour [P:5727%]

$ kcbench -i 3 -j 64 -o $O -s $PWD --no-download -m
Processor:           Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz [88 CPUs]
Cpufreq; Memory:     powersave [intel_pstate]; 128823 MiB
Linux running:       5.6.0-2-amd64 [x86_64]
Compiler:            gcc (Debian 10.3.0-11) 10.3.0
Linux compiled:      5.15.0-rc4
Config; Environment: allmodconfig; CCACHE_DISABLE="1"
Build command:       make vmlinux modules
Filling caches:      This might take a while... Done
Run 1 (-j 64):       462.32 seconds / 7.79 kernels/hour [P:6009%]
Run 2 (-j 64):       462.33 seconds / 7.79 kernels/hour [P:6006%]
Run 3 (-j 64):       465.45 seconds / 7.73 kernels/hour [P:5999%]

In [41]: numpy.median(y1)
Out[41]: 464.64

In [42]: numpy.median(y2)
Out[42]: 462.33

Speedup: +0.5%

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] kernel.h further split
  2021-10-07 14:47       ` Andy Shevchenko
@ 2021-10-07 15:07         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-07 15:07 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Thorsten Leemhuis, Peter Zijlstra, Thomas Gleixner, Randy Dunlap,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	KUnit Development, Linux Media Mailing List, netdev,
	Brendan Higgins, Rafael J. Wysocki, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Thu, Oct 07, 2021 at 05:47:31PM +0300, Andy Shevchenko wrote:
> On Thu, Oct 07, 2021 at 03:59:08PM +0200, Greg Kroah-Hartman wrote:
> > On Thu, Oct 07, 2021 at 02:51:15PM +0300, Andy Shevchenko wrote:
> > > On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> 
> ...
> 
> > > Meanwhile, Thorsten, can you have a look at my approach and tell if it
> > > makes sense?
> > 
> > No, do not use ccache when trying to benchmark the speed of kernel
> > builds, that tests the speed of your disk subsystem...
> 
> First rule of the measurement is to be sure WHAT we are measuring.
> And I'm pretty much explained WHAT and HOW. On the other hand, the
> kcbench can't answer to the question about C preprocessing speed
> without help of ccache or something similar.
> 
> Measuring complete build is exactly not what we want because of
> O(compilation) vs. o(C preprocessing) meaning that any fluctuation
> in the former makes silly to measure anything from the latter.
> 
> You see, my theory is proved by practical experiment:
> 
> $ kcbench -i 3 -j 64 -o $O -s $PWD --no-download -m
> Processor:           Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz [88 CPUs]
> Cpufreq; Memory:     powersave [intel_pstate]; 128823 MiB
> Linux running:       5.6.0-2-amd64 [x86_64]
> Compiler:            gcc (Debian 10.3.0-11) 10.3.0
> Linux compiled:      5.15.0-rc4
> Config; Environment: allmodconfig; CCACHE_DISABLE="1"
> Build command:       make vmlinux modules
> Filling caches:      This might take a while... Done
> Run 1 (-j 64):       464.07 seconds / 7.76 kernels/hour [P:6001%]
> Run 2 (-j 64):       464.64 seconds / 7.75 kernels/hour [P:6000%]
> Run 3 (-j 64):       486.41 seconds / 7.40 kernels/hour [P:5727%]
> 
> $ kcbench -i 3 -j 64 -o $O -s $PWD --no-download -m
> Processor:           Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz [88 CPUs]
> Cpufreq; Memory:     powersave [intel_pstate]; 128823 MiB
> Linux running:       5.6.0-2-amd64 [x86_64]
> Compiler:            gcc (Debian 10.3.0-11) 10.3.0
> Linux compiled:      5.15.0-rc4
> Config; Environment: allmodconfig; CCACHE_DISABLE="1"
> Build command:       make vmlinux modules
> Filling caches:      This might take a while... Done
> Run 1 (-j 64):       462.32 seconds / 7.79 kernels/hour [P:6009%]
> Run 2 (-j 64):       462.33 seconds / 7.79 kernels/hour [P:6006%]
> Run 3 (-j 64):       465.45 seconds / 7.73 kernels/hour [P:5999%]
> 
> In [41]: numpy.median(y1)
> Out[41]: 464.64
> 
> In [42]: numpy.median(y2)
> Out[42]: 462.33
> 
> Speedup: +0.5%

Good, you measured what actually matters here, the real compilation of
the code, not just the pre-processing of it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions
  2021-10-07 11:44     ` Andy Shevchenko
@ 2021-10-08  2:18       ` Herbert Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2021-10-08  2:18 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Zijlstra, Thomas Gleixner, Randy Dunlap, linux-kernel,
	linux-kselftest, kunit-dev, linux-media, netdev, Brendan Higgins,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Andrew Morton

On Thu, Oct 07, 2021 at 02:44:41PM +0300, Andy Shevchenko wrote:
>
> Crap is in the kernel.h. Could you elaborate how making a proper list
> of the inclusions is a crap?

Unless you're planning on not including all those header files from
kernel.h, then adding them all to an end node like rhashtable.c is
just a waste of time.

You should be targetting other header files and not c files.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] kernel.h further split
  2021-10-07 11:51   ` Andy Shevchenko
  2021-10-07 13:59     ` Greg Kroah-Hartman
@ 2021-10-08  9:37     ` Thorsten Leemhuis
  2021-10-13 10:31       ` Andy Shevchenko
  1 sibling, 1 reply; 17+ messages in thread
From: Thorsten Leemhuis @ 2021-10-08  9:37 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Greg Kroah-Hartman, Andy Shevchenko, Peter Zijlstra,
	Thomas Gleixner, Randy Dunlap, Linux Kernel Mailing List,
	open list:KERNEL SELFTEST FRAMEWORK, KUnit Development,
	Linux Media Mailing List, netdev, Brendan Higgins,
	Rafael J. Wysocki, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

(sorry, sending it a second time with a different mail client, as vger
rejected my earlier mail with the "Content-Policy reject msg: Wrong
MIME labeling on 8-bit character texts." – and as of now I'm unable to
figure out what's wrong :-/ )

On Thu, 7 Oct 2021 14:51:15 +0300
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

> On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:  
> > > The kernel.h is a set of something which is not related to each
> > > other and often used in non-crossed compilation units, especially
> > > when drivers need only one or two macro definitions from it.
> > >
> > > Here is the split of container_of(). The goals are the following:
> > > - untwist the dependency hell a bit
> > > - drop kernel.h inclusion where it's only used for container_of()
> > > - speed up C preprocessing.
> > >
> > > People, like Greg KH and Miguel Ojeda, were asking about the
> > > latter. Read below the methodology and test setup with outcome
> > > numbers.
> > >
> > > The methodology
> > > ===============
> > > The question here is how to measure in the more or less clean way
> > > the C preprocessing time when building a project like Linux
> > > kernel. To answer it, let's look around and see what tools do we
> > > have that may help. Aha, here is ccache tool that seems quite
> > > plausible to be used. Its core idea is to preprocess C file,
> > > count hash (MD4) and compare to ones that are in the cache. If
> > > found, return the object file, avoiding compilation stage.
> > >
> > > Taking into account the property of the ccache, configure and use
> > > it in the below steps:
> > >
> > > 1. Configure kernel with allyesconfig
> > >
> > > 2. Make it with `make` to be sure that the cache is filled with
> > >    the latest data. I.o.w. warm up the cache.
> > >
> > > 3. Run `make -s` (silent mode to reduce the influence of
> > >    the unrelated things, like console output) 10 times and
> > >    measure 'real' time spent.
> > >
> > > 4. Repeat 1-3 for each patch or patch set to get data sets before
> > >    and after.
> > >
> > > When we get the raw data, calculating median will show us the
> > > number. Comparing them before and after we will see the
> > > difference.
> > >
> > > The setup
> > > =========
> > > I have used the Intel x86_64 server platform (see partial output
> > > of `lscpu` below):
> > >
> > > $ lscpu
> > > Architecture:            x86_64
> > >   CPU op-mode(s):        32-bit, 64-bit
> > >   Address sizes:         46 bits physical, 48 bits virtual
> > >   Byte Order:            Little Endian
> > > CPU(s):                  88
> > >   On-line CPU(s) list:   0-87
> > > Vendor ID:               GenuineIntel
> > >   Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > >     CPU family:          6
> > >     Model:               79
> > >     Thread(s) per core:  2
> > >     Core(s) per socket:  22
> > >     Socket(s):           2
> > >     Stepping:            1
> > >     CPU max MHz:         3600.0000
> > >     CPU min MHz:         1200.0000
> > > ...
> > > Caches (sum of all):
> > >   L1d:                   1.4 MiB (44 instances)
> > >   L1i:                   1.4 MiB (44 instances)
> > >   L2:                    11 MiB (44 instances)
> > >   L3:                    110 MiB (2 instances)
> > > NUMA:
> > >   NUMA node(s):          2
> > >   NUMA node0 CPU(s):     0-21,44-65
> > >   NUMA node1 CPU(s):     22-43,66-87
> > > Vulnerabilities:
> > >   Itlb multihit:         KVM: Mitigation: Split huge pages
> > >   L1tf:                  Mitigation; PTE Inversion; VMX
> > > conditional cache flushes, SMT vulnerable Mds:
> > > Mitigation; Clear CPU buffers; SMT vulnerable Meltdown:
> > >    Mitigation; PTI Spec store bypass:     Mitigation; Speculative
> > > Store Bypass disabled via prctl and seccomp Spectre v1:
> > >  Mitigation; usercopy/swapgs barriers and __user pointer
> > > sanitization Spectre v2:            Mitigation; Full generic
> > > retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB
> > > filling Tsx async abort:       Mitigation; Clear CPU buffers; SMT
> > > vulnerable
> > >
> > > With the following GCC:
> > >
> > > $ gcc --version
> > > gcc (Debian 10.3.0-11) 10.3.0
> > >
> > > The commands I have run during the measurement were:
> > >
> > >       rm -rf $O
> > >       make O=$O allyesconfig
> > >       time make O=$O -s -j64  # this step has been measured

BTW, what kcbench does in the end is not that different, but it only
builds the config once and that uses it for all further testing. 

> > > The raw data and median
> > > =======================
> > > Before patch 2 (yes, I have measured the only patch 2 effect) in
> > > the series (the data is sorted by time):
> > >
> > > real    2m8.794s
> > > real    2m11.183s
> > > real    2m11.235s
> > > real    2m11.639s
> > > real    2m11.960s
> > > real    2m12.014s
> > > real    2m12.609s
> > > real    2m13.177s
> > > real    2m13.462s
> > > real    2m19.132s
> > >
> > > After patch 2 has been applied:
> > >
> > > real    2m8.536s
> > > real    2m8.776s
> > > real    2m9.071s
> > > real    2m9.459s
> > > real    2m9.531s
> > > real    2m9.610s
> > > real    2m10.356s
> > > real    2m10.430s
> > > real    2m11.117s
> > > real    2m11.885s
> > >
> > > Median values are:
> > >       131.987s before
> > >       129.571s after
> > >
> > > We see the steady speedup as of 1.83%.  
> >
> > You do know about kcbench:
> >         https://gitlab.com/knurd42/kcbench.git
> >
> > Try running that to make it such that we know how it was tested :)  
> 
> I'll try it.
> 
> Meanwhile, Thorsten, can you have a look at my approach and tell if it
> makes sense?

I'm not the right person to ask here, I don't know enough about the
inner working of ccache and C preprocessing. Reminder: I'm not a real
kernel/C developer, but more kind of a parasite that lives on the
fringes of kernel development. ;-) Kcbench in fact originated as a
benchmark magazine for the computer magazine I used to work for – where
I also did quite a few benchmarks. But that knowledge might be helpful
here:

The measurements before and after patch 2 was applied get slower over
time. That is a hint that something is interfering. Is the disk filling
up and making the fs do more work? Or is the machine getting to hot? It
IMHO would be worth investigating and ruling out, as the differences
you are looking out are likely quite small

Also: the last run of the first measurement cycle is off by quite a
bit, so I wouldn't even include the result, as there like was something
that disturbed the benchmark.

And I might be missing something, but why were you using "-j 64" on a
machine with 44 cores/88 threads? I wonder if that might lead do
interesting effects due to SMT (some core will run two threads, other
only one). Using either "-j 44" or "-j 88" might be better. But I
suggest you run kcbench once without specifying "-j", as that will
check which setting is the fastest on this system – and then use that
for all further tests.

HTH, Ciao, Thorsten 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] kernel.h further split
  2021-10-08  9:37     ` Thorsten Leemhuis
@ 2021-10-13 10:31       ` Andy Shevchenko
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Shevchenko @ 2021-10-13 10:31 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner,
	Randy Dunlap, Linux Kernel Mailing List,
	open list:KERNEL SELFTEST FRAMEWORK, KUnit Development,
	Linux Media Mailing List, netdev, Brendan Higgins,
	Rafael J. Wysocki, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Sakari Ailus, Laurent Pinchart,
	Mauro Carvalho Chehab, Thomas Graf, Herbert Xu, Andrew Morton

On Fri, Oct 08, 2021 at 11:37:58AM +0200, Thorsten Leemhuis wrote:
> On Thu, 7 Oct 2021 14:51:15 +0300
> Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
> > On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:
> > > > The kernel.h is a set of something which is not related to each
> > > > other and often used in non-crossed compilation units, especially
> > > > when drivers need only one or two macro definitions from it.
> > > >
> > > > Here is the split of container_of(). The goals are the following:
> > > > - untwist the dependency hell a bit
> > > > - drop kernel.h inclusion where it's only used for container_of()
> > > > - speed up C preprocessing.
> > > >
> > > > People, like Greg KH and Miguel Ojeda, were asking about the
> > > > latter. Read below the methodology and test setup with outcome
> > > > numbers.
> > > >
> > > > The methodology
> > > > ===============
> > > > The question here is how to measure in the more or less clean way
> > > > the C preprocessing time when building a project like Linux
> > > > kernel. To answer it, let's look around and see what tools do we
> > > > have that may help. Aha, here is ccache tool that seems quite
> > > > plausible to be used. Its core idea is to preprocess C file,
> > > > count hash (MD4) and compare to ones that are in the cache. If
> > > > found, return the object file, avoiding compilation stage.
> > > >
> > > > Taking into account the property of the ccache, configure and use
> > > > it in the below steps:
> > > >
> > > > 1. Configure kernel with allyesconfig
> > > >
> > > > 2. Make it with `make` to be sure that the cache is filled with
> > > >    the latest data. I.o.w. warm up the cache.
> > > >
> > > > 3. Run `make -s` (silent mode to reduce the influence of
> > > >    the unrelated things, like console output) 10 times and
> > > >    measure 'real' time spent.
> > > >
> > > > 4. Repeat 1-3 for each patch or patch set to get data sets before
> > > >    and after.
> > > >
> > > > When we get the raw data, calculating median will show us the
> > > > number. Comparing them before and after we will see the
> > > > difference.
> > > >
> > > > The setup
> > > > =========
> > > > I have used the Intel x86_64 server platform (see partial output
> > > > of `lscpu` below):
> > > >
> > > > $ lscpu
> > > > Architecture:            x86_64
> > > >   CPU op-mode(s):        32-bit, 64-bit
> > > >   Address sizes:         46 bits physical, 48 bits virtual
> > > >   Byte Order:            Little Endian
> > > > CPU(s):                  88
> > > >   On-line CPU(s) list:   0-87
> > > > Vendor ID:               GenuineIntel
> > > >   Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > >     CPU family:          6
> > > >     Model:               79
> > > >     Thread(s) per core:  2
> > > >     Core(s) per socket:  22
> > > >     Socket(s):           2
> > > >     Stepping:            1
> > > >     CPU max MHz:         3600.0000
> > > >     CPU min MHz:         1200.0000
> > > > ...
> > > > Caches (sum of all):
> > > >   L1d:                   1.4 MiB (44 instances)
> > > >   L1i:                   1.4 MiB (44 instances)
> > > >   L2:                    11 MiB (44 instances)
> > > >   L3:                    110 MiB (2 instances)
> > > > NUMA:
> > > >   NUMA node(s):          2
> > > >   NUMA node0 CPU(s):     0-21,44-65
> > > >   NUMA node1 CPU(s):     22-43,66-87
> > > > Vulnerabilities:
> > > >   Itlb multihit:         KVM: Mitigation: Split huge pages
> > > >   L1tf:                  Mitigation; PTE Inversion; VMX
> > > > conditional cache flushes, SMT vulnerable Mds:
> > > > Mitigation; Clear CPU buffers; SMT vulnerable Meltdown:
> > > >    Mitigation; PTI Spec store bypass:     Mitigation; Speculative
> > > > Store Bypass disabled via prctl and seccomp Spectre v1:
> > > >  Mitigation; usercopy/swapgs barriers and __user pointer
> > > > sanitization Spectre v2:            Mitigation; Full generic
> > > > retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB
> > > > filling Tsx async abort:       Mitigation; Clear CPU buffers; SMT
> > > > vulnerable
> > > >
> > > > With the following GCC:
> > > >
> > > > $ gcc --version
> > > > gcc (Debian 10.3.0-11) 10.3.0
> > > >
> > > > The commands I have run during the measurement were:
> > > >
> > > >       rm -rf $O
> > > >       make O=$O allyesconfig
> > > >       time make O=$O -s -j64  # this step has been measured
> 
> BTW, what kcbench does in the end is not that different, but it only
> builds the config once and that uses it for all further testing.

Since I measure the third operation only this shouldn't affect recreation
of the configuration file.

> > > > The raw data and median
> > > > =======================
> > > > Before patch 2 (yes, I have measured the only patch 2 effect) in
> > > > the series (the data is sorted by time):
> > > >
> > > > real    2m8.794s
> > > > real    2m11.183s
> > > > real    2m11.235s
> > > > real    2m11.639s
> > > > real    2m11.960s
> > > > real    2m12.014s
> > > > real    2m12.609s
> > > > real    2m13.177s
> > > > real    2m13.462s
> > > > real    2m19.132s
> > > >
> > > > After patch 2 has been applied:
> > > >
> > > > real    2m8.536s
> > > > real    2m8.776s
> > > > real    2m9.071s
> > > > real    2m9.459s
> > > > real    2m9.531s
> > > > real    2m9.610s
> > > > real    2m10.356s
> > > > real    2m10.430s
> > > > real    2m11.117s
> > > > real    2m11.885s
> > > >
> > > > Median values are:
> > > >       131.987s before
> > > >       129.571s after
> > > >
> > > > We see the steady speedup as of 1.83%.
> > >
> > > You do know about kcbench:
> > >         https://gitlab.com/knurd42/kcbench.git
> > >
> > > Try running that to make it such that we know how it was tested :)
> > 
> > I'll try it.
> > 
> > Meanwhile, Thorsten, can you have a look at my approach and tell if it
> > makes sense?
> 
> I'm not the right person to ask here, I don't know enough about the
> inner working of ccache and C preprocessing. Reminder: I'm not a real
> kernel/C developer, but more kind of a parasite that lives on the
> fringes of kernel development. ;-) Kcbench in fact originated as a
> benchmark magazine for the computer magazine I used to work for – where
> I also did quite a few benchmarks. But that knowledge might be helpful
> here:
> 
> The measurements before and after patch 2 was applied get slower over
> time. That is a hint that something is interfering. Is the disk filling
> up and making the fs do more work? Or is the machine getting to hot? It
> IMHO would be worth investigating and ruling out, as the differences
> you are looking out are likely quite small

I tried to explain why my methodology is closer to what we need to measure
in the above and replies. TL;DR: mathematically the O() shadows o() and as
we know the CPU and disk usage during compilation is a huge in comparison
to the C preprocessing. I'm not sure what you are referring by "slower
over time" since I explicitly said that I have _sorted_ the data. Nothing
should be done here, I believe.

> Also: the last run of the first measurement cycle is off by quite a
> bit, so I wouldn't even include the result, as there like was something
> that disturbed the benchmark.

I believe you missed the very same remark, i.e. that the data is sorted.

> And I might be missing something, but why were you using "-j 64" on a
> machine with 44 cores/88 threads?

Because that machine has more processes being run. And I would like to
minimize fluctuation of the CPU scheduling when some process requires
a resource to perform little work.

> I wonder if that might lead do
> interesting effects due to SMT (some core will run two threads, other
> only one). Using either "-j 44" or "-j 88" might be better.

How -j64 can be better? Nothing will guarantee that any of the core will
be half-loaded. But -j88 is worse because any process that wakes up and
requires for a resource may affect the measurements.

> But I
> suggest you run kcbench once without specifying "-j", as that will
> check which setting is the fastest on this system – and then use that
> for all further tests.

Next time I will try this approach, thanks for your reply and insights!

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-13 10:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07  9:51 [PATCH v2 0/4] kernel.h further split Andy Shevchenko
2021-10-07  9:51 ` [PATCH v2 1/4] kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers Andy Shevchenko
2021-10-07  9:51 ` [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros Andy Shevchenko
2021-10-07 10:37   ` Greg Kroah-Hartman
2021-10-07 10:38   ` Greg Kroah-Hartman
2021-10-07  9:51 ` [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions Andy Shevchenko
2021-10-07 11:23   ` Herbert Xu
2021-10-07 11:44     ` Andy Shevchenko
2021-10-08  2:18       ` Herbert Xu
2021-10-07  9:51 ` [PATCH v2 4/4] kunit: " Andy Shevchenko
2021-10-07 10:33 ` [PATCH v2 0/4] kernel.h further split Greg Kroah-Hartman
2021-10-07 11:51   ` Andy Shevchenko
2021-10-07 13:59     ` Greg Kroah-Hartman
2021-10-07 14:47       ` Andy Shevchenko
2021-10-07 15:07         ` Greg Kroah-Hartman
2021-10-08  9:37     ` Thorsten Leemhuis
2021-10-13 10:31       ` Andy Shevchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.