All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"David S. Miller" <davem@davemloft.net>,
	Ard Biesheuvel <ardb@kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
Date: Tue, 4 Jan 2022 20:37:28 -0500	[thread overview]
Message-ID: <CAK8P3a3eFumM0dHkbdqL_1BwEZNRn9x3WxKbWKyapErd3SEEcw@mail.gmail.com> (raw)
In-Reply-To: <YdTg3bO6qs0frHVk@gmail.com>

On Tue, Jan 4, 2022 at 7:05 PM Ingo Molnar <mingo@kernel.org> wrote:
> * Arnd Bergmann <arnd@arndb.de> wrote:
>
> > From what I could tell, linux/sched.h was not the only such problem, but
> > I saw similarly bad issues with linux/fs.h (which is what I posted about
> > in November/December), linux/mm.h and linux/netdevice.h on the high
> > level, in low-level headers there are huge issues with linux/atomic.h,
> > linux/mutex.h, linux/pgtable.h etc. I expect that you have addressed
> > these as well,
>
> Correct, each of these was a problem - and a *lot* of other headers in
> addition to those:
>
>   kepler:~/mingo.tip.git> git diff --stat v5.16-rc8.. include/linux/ arch/*/include/asm/ | grep changed
>
>     1335 files changed, 59677 insertions(+), 56582 deletions(-)
>
> and I reduced all the kernels that showed up in the bloat-profile to a
> fraction of their orignal size:
>
>     ------------------------------------------------------------------------------------------
>     | Combined, preprocessed C code size of header, without line markers,
>     | with comments stripped:
>     ------------------------------.-----------------------------.-----------------------------
>                                   | v5.16-rc7                   |  -fast-headers-v1
>                                   |-----------------------------|-----------------------------
>      #include <linux/sched.h>     | LOC: 13,292 | headers:  324 |  LOC:    769 | headers:   64
>      #include <linux/wait.h>      | LOC:  9,369 | headers:  235 |  LOC:    483 | headers:   46
>      #include <linux/rcupdate.h>  | LOC:  8,975 | headers:  224 |  LOC:  1,385 | headers:   86
>      #include <linux/hrtimer.h>   | LOC: 10,861 | headers:  265 |  LOC:    229 | headers:   37
>      #include <linux/fs.h>        | LOC: 22,497 | headers:  427 |  LOC:  1,993 | headers:  120
>      #include <linux/cred.h>      | LOC: 17,257 | headers:  368 |  LOC:  4,830 | headers:  129
>      #include <linux/dcache.h>    | LOC: 10,545 | headers:  253 |  LOC:    858 | headers:   65
>      #include <linux/cgroup.h>    | LOC: 33,518 | headers:  522 |  LOC:  2,477 | headers:  111
>      #include <linux/module.h>    | LOC: 16,948 | headers:  339 |  LOC:  2,239 | headers:  122
>      #include <linux/kobject.h>   | LOC: 15,210 | headers:  318 |  LOC:    799 | headers:   59
>      #include <linux/device.h>    | LOC: 20,505 | headers:  408 |  LOC:  2,131 | headers:  123
>      #include <linux/gfp.h>       | LOC: 13,543 | headers:  303 |  LOC:    181 | headers:   26
>      #include <linux/slab.h>      | LOC: 14,037 | headers:  307 |  LOC:    999 | headers:   74
>      #include <linux/mm.h>        | LOC: 26,727 | headers:  453 |  LOC:  1,855 | headers:  133
>      #include <linux/mmzone.h>    | LOC: 12,755 | headers:  293 |  LOC:    832 | headers:   64
>      #include <linux/swap.h>      | LOC: 38,292 | headers:  559 |  LOC: 11,085 | headers:  294
>      #include <linux/writeback.h> | LOC: 36,481 | headers:  550 |  LOC:  1,566 | headers:   92
>      #include <linux/gfp.h>       | LOC: 13,543 | headers:  303 |  LOC:    181 | headers:   26
>      #include <linux/skbuff.h>    | LOC: 36,130 | headers:  558 |  LOC:  1,209 | headers:   89
>      #include <linux/tcp.h>       | LOC: 60,133 | headers:  725 |  LOC:  3,829 | headers:  153
>      #include <linux/udp.h>       | LOC: 59,411 | headers:  721 |  LOC:  3,236 | headers:  146
>      #include <linux/filter.h>    | LOC: 54,172 | headers:  689 |  LOC:  4,087 | headers:   73
>      #include <linux/interrupt.h> | LOC: 14,085 | headers:  340 |  LOC:  2,629 | headers:  124
>
>      #include <net/sock.h>        | LOC: 58,880 | headers:  715 |  LOC:  1,543 | headers:   98
>
>      #include <asm/processor.h>   | LOC:  7,821 | headers:  204 |  LOC:    618 | headers:   41
>      #include <asm/page.h>        | LOC:  1,540 | headers:   97 |  LOC:  1,193 | headers:   82
>      #include <asm/pgtable.h>     | LOC: 12,949 | headers:  297 |  LOC:  5,742 | headers:  217

Ok, this is roughly the list of headers that I had looked at previously.

> <linux/atomic.h> wasn't a particularly big problem - but it does get
> included everywhere, so I moved the most common atomic_t definition into
> <linux/types.h> (on 64-bit kernels), which allowed a big reduction for the
> majority of cases that don't use the atomic APIs:

Good, I have a patch for the same thing, including moving atomic64_t
and atomic_long_t to linux/types.h there -- I don't think it would be good to
have it in different places on 32-bit architectures.

On arm machines, I found atomic.h to be problematic because it is a large
generated header that depends on the barriers which in turn require other
stuff.

>  #include <linux/atomic.h>               | LOC:    176 | headers:   26
>  #include <linux/atomic_api.h>           | LOC:  2,785 | headers:   52
>
> But <linux/atomic_api.h> is still included in ~75% of .c files, mostly for
> good reasons, because it's a very popular low level API.

These are the x86 numbers, right?

> > but I'd like to make sure that your changes are reasonably complete on
> > arm32 and arm64 to avoid having to do the big cleanup more than once.
>
> I did test ARM64 extensively in terms of build coverage - but not in terms
> of header bloat, and I'm sure more could be done there!

My guess is that each architecture has a couple of dark corners that
require cleaning up before we actually see the benefit of the series.
I'm personally most interested in arm32 and arm64 because that's what
I do my testing on, and I'll try to find those corners. One thing I remember
for arm32 is that there is a nasty dependency for get_current() - >
PAGE_SIZE -> asm/pgtable.h, with pgtable including the world again.
You probably got this one, but any such missing thing can can lead to the
other cleanups not helping that much.

> > My approach to the large mid-level headers is somewhat different: rather
> > than completely avoiding them from getting included, I would like to
> > split up the structure definitions from the inline functions.
>
> That's a big chunk of what the -fast-headers tree does: I've split over 85
> headers into <linux/header_types.h> and <linux/header_api.h>...
>
> I've also split up headers further where needed, in particular mm.h
> required multiple levels of splitting to get the dependencies of the most
> commonly used <linux/mm_types.h> and <linux/mm_api.h> headers under
> control:
>
>   kepler:~/mingo.tip.git> ls -ldt include/linux/mm*api*.h
>   -rw-rw-r-- 1 mingo mingo 77130 Jan  4 13:32 include/linux/mm_api.h
>   -rw-rw-r-- 1 mingo mingo 22227 Jan  4 13:32 include/linux/mmzone_api.h
>   -rw-rw-r-- 1 mingo mingo  6759 Jan  4 13:32 include/linux/mm_api_extra.h
>   -rw-rw-r-- 1 mingo mingo   479 Jan  4 13:31 include/linux/mm_api_exe_file.h
>   -rw-rw-r-- 1 mingo mingo   960 Jan  4 13:31 include/linux/mm_api_truncate.h
>   -rw-rw-r-- 1 mingo mingo  1262 Jan  4 13:31 include/linux/mm_api_kvmalloc.h
>   -rw-rw-r-- 1 mingo mingo   719 Jan  4 13:31 include/linux/mm_api_gate_area.h
>   -rw-rw-r-- 1 mingo mingo  1342 Jan  4 13:31 include/linux/mm_api_kasan.h
>   -rw-rw-r-- 1 mingo mingo  3007 Jan  4 13:31 include/linux/mm_api_tlb_flush.h

Ah, good. That is pretty close to what I had in mind as well, so maybe
we can convince Linus after all. ;-)

> The results are pretty nice:
>
>  # vanilla:
>
>    #include <linux/mm.h>                   | LOC: 26,728 | headers:  453
>
>  # -fast-headers:
>
>    #include <linux/mm.h>                   | LOC:  1,855 | headers:  132  # == mm_types.h
>    #include <linux/mm_types.h>             | LOC:  1,855 | headers:  131
>    #include <linux/mm_api.h>               | LOC:  8,587 | headers:  229
>
> And <linux/mm_api.h> is now included only in about 25% of the .c files - in
> the vanilla kernel the use percentage is over ~90%.
>
> But despite all those reductions, <linux/mm_api.h> is still a header with
> one of the largest cumulative footprints within a (distro) kernel build:
>
>                                                               | stripped lines of code
>                                                               |              _____________________________
>                                                               |             | headers included recursively
>                                                               |             |                _______________________________
>                                                               |             |               | usage in a distro kernel build
>  ____________                                                 |             |               |         _________________________________________
> | header name                                                 |             |               |        | million lines of comment-stripped C code
> |                                                             |             |               |        |
>   #include <linux/spinlock_api.h>                             | LOC:  5,142 | headers:  123 | 10,168 | MLOC:   52.2 | #############
>   #include <linux/device/driver.h>                            | LOC:  4,132 | headers:  169 | 12,306 | MLOC:   50.8 | ############
>   #include <linux/mm_api.h>                                   | LOC:  8,584 | headers:  230 |  5,135 | MLOC:   44.0 | ###########
>   #include <linux/skbuff_api.h>                               | LOC:  8,404 | headers:  190 |  5,065 | MLOC:   42.5 | ##########
>   #include <linux/atomic_api.h>                               | LOC:  2,785 | headers:   52 | 15,282 | MLOC:   42.5 | ##########
>   #include <asm/spinlock.h>                                   | LOC:  4,039 | headers:   83 | 10,168 | MLOC:   41.0 | ##########
>   #include <asm/qrwlock.h>                                    | LOC:  4,039 | headers:   82 | 10,168 | MLOC:   41.0 | ##########
>   #include <asm-generic/qrwlock.h>                            | LOC:  4,039 | headers:   81 | 10,168 | MLOC:   41.0 | ##########
>   #include <linux/page_ref.h>                                 | LOC:  5,397 | headers:  168 |  7,578 | MLOC:   40.8 | ##########
>   #include <asm/qspinlock.h>                                  | LOC:  3,990 | headers:   80 | 10,169 | MLOC:   40.5 | ##########
>   #include <linux/device_types.h>                             | LOC:  2,131 | headers:  122 | 17,424 | MLOC:   37.1 | #########
>   #include <linux/module.h>                                   | LOC:  2,239 | headers:  122 | 16,472 | MLOC:   36.8 | #########
>   #include <net/cfg80211.h>                                   | LOC: 29,004 | headers:  423 |  1,205 | MLOC:   34.9 | ########
>   #include <linux/pci.h>                                      | LOC:  7,092 | headers:  232 |  4,849 | MLOC:   34.3 | ########
>   #include <linux/netdevice_api.h>                            | LOC:  8,434 | headers:  225 |  4,065 | MLOC:   34.2 | ########
>   #include <linux/refcount_api.h>                             | LOC:  3,421 | headers:   87 |  9,776 | MLOC:   33.4 | ########
>
> ( The 'MLOC' footprint estimate is number of usages times
>   preprocessed-stripped-header size. )

This is also the metric that I used in my scripts, except I measured
the preprocessed
size in bytes instead of lines, which should make little difference.

> I've reduced header bloat through three primary angles of attack:
>
>   - reducing number of inclusions
>
>   - reducing header size itself, by type/API splitting & by segmenting
>     headers along API usage frequency
>
>   - decoupling headers from each other
>
> As you can see, fast-headers -v1 is much improved (on x86), but there's
> plenty of work left, such as <net/cfg80211.h>. :-)

Right. I mainly focused on splitting types from the rest, which I think
brings most of the benefits, but taking it further as you did here
helps more.

> > Linus didn't really like my approach,
>
> Yeah, so without having a significant build time speedup I didn't like my
> approach(es) either, which is why I didn't post this tree for a long time. :-)
>
> But the results speak for themselves IMO, and we cannot ignore this: my
> project actually accelerated as I progressed, because the kernel rebuilds,
> especially incremental ones, became faster and faster...
>
> Linux kernel header dependencies need to be simplified.

Agreed. In my 2020 experiments, I managed to get from the point of cleaning
up ~100 headers with very little effect (when everything was still included
through some other header) to cleaning up the next 100 and seeing huge
improvements but also getting discouraged because it started breaking
every driver due to missing indirect includes.

> > but I suspect he'll have similar
> > concerns about your solution for linux/sched.h, especially if we end up
> > applying the same hack to other commonly used structures (sk_buff,
> > mm_struct, super_block) in the end.
>
> So the per_task approach is pretty much unavoidable under the constraint of
> having no runtime overhead, given that task_struct is a historic union of a
> zillion types, where 99% of the users don't actually need to know about
> those types.
>
> ( We could eventually get rid of per_task() as well, by turning complex
>   embedded structs into pointers - but that has runtime overhead due to the
>   indirections, and I tried hard to make this approach runtime-invariant,
>   at least conceptually. )

Would it be possible to have one common task_struct definition that has
all the frequently-accessed fields, plus another larger structure that
embeds the smaller structure plus all the other stuff? I suppose that
would require even larger scale reworks, but it may be a nicer end
result. (again, I have yet to read your patches, so there is probably
an obvious answer why you didn't do this).

          Arnd

  reply	other threads:[~2022-01-05  1:37 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-02 21:57 [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-03 10:11 ` Greg Kroah-Hartman
2022-01-03 11:12   ` Ingo Molnar
2022-01-03 13:46     ` Greg Kroah-Hartman
2022-01-03 16:29       ` Ingo Molnar
2022-01-10 10:28         ` Peter Zijlstra
2022-01-04 14:10     ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar
2022-01-04 15:14       ` Andy Shevchenko
2022-01-04 23:27         ` Ingo Molnar
2022-01-04 17:51     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann
2022-01-05  0:05       ` Ingo Molnar
2022-01-05  1:37         ` Arnd Bergmann [this message]
2022-01-05  9:37       ` Andy Shevchenko
2022-01-04 14:05   ` [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Ingo Molnar
2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov
2022-01-04 10:54   ` Ingo Molnar
2022-01-04 13:34     ` Greg Kroah-Hartman
2022-01-04 13:54       ` [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() Ingo Molnar
2022-01-04 15:09         ` Greg Kroah-Hartman
2022-01-04 15:14           ` Greg Kroah-Hartman
2022-01-05  0:11             ` Ingo Molnar
2022-01-05 15:23               ` Greg Kroah-Hartman
2022-01-06 11:26                 ` Ingo Molnar
2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor
2022-01-04 10:47   ` Ingo Molnar
2022-01-04 10:56     ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar
2022-01-04 11:02     ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar
2022-01-04 15:05       ` kernel test robot
2022-01-04 15:05         ` kernel test robot
2022-01-04 17:51       ` Nathan Chancellor
2022-01-05  0:20         ` Ingo Molnar
2022-01-05  0:26           ` [PATCH] headers/deps: Attribute placement fixes for Clang & GCC Ingo Molnar
2022-01-04 11:19     ` [TREE] "Fast Kernel Headers" Tree WIP/development branch Ingo Molnar
2022-01-04 17:25     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers
2022-01-05  0:43       ` Ingo Molnar
2022-01-04 17:50     ` Nathan Chancellor
2022-01-05  0:35       ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar
2022-01-08 21:57         ` [tip: x86/build] " tip-bot2 for Ingo Molnar
2022-01-05  0:40       ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-05  1:07         ` Ingo Molnar
2022-01-05  5:20           ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel'\''s "Dependency Hell Paul Zimmerman
2022-01-05 21:42           ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor
2022-01-08 10:32             ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar
2022-01-08 11:08             ` [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link Ingo Molnar
2022-01-08 11:18             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-08 11:38             ` [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Ingo Molnar
2022-01-08 11:49             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-08 12:17               ` Ingo Molnar
2022-01-10 20:03               ` Nathan Chancellor
2022-01-10 20:05                 ` Nathan Chancellor
2022-01-05 22:33         ` Nathan Chancellor
2022-01-08 15:16       ` Ingo Molnar
2022-01-07  0:29     ` Nathan Chancellor
2022-01-08 11:54       ` Ingo Molnar
2022-01-04 12:36 ` Willy Tarreau
2022-01-04 16:05 ` Andy Shevchenko
2022-01-04 16:18 ` Andy Shevchenko
2022-01-15  0:42 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK8P3a3eFumM0dHkbdqL_1BwEZNRn9x3WxKbWKyapErd3SEEcw@mail.gmail.com \
    --to=arnd@arndb.de \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.