Andrew Pinski <pinskia@gmail.com> writes:

> On Tue, Jan 9, 2024 at 11:57 AM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> Hi all, I'm going to stir the hornet's nest and make what has become the
>> ultimate sacrilege.
>>
>> Andrew Pinski recently made aware of this thread. I realize it was
>> released on April 1, 2018, and either was a joke or might have been
>> taken as one. However, I think there is validity to it, and I'm going to
>> try to motivate my opinion here.
>>
>> Both C and C++ has had a lot of development since 1999, and C++ has in
>> fact, in my personal opinion, finally "grown up" to be a better C for
>> the kind of embedded programming that an OS kernel epitomizes. I'm
>> saying that as the author of a very large number of macro and inline
>> assembly hacks in the kernel.
>>
>> What really makes me say that is that a lot of things we have recently
>> asked for gcc-specific extensions are in fact relatively easy to
>> implement in standard C++ and, in many cases, allows for infrastructure
>> improvement *without* global code changes (see below.)
>>
>> C++14 is in my option the "minimum" version that has reasonable
>> metaprogramming support has most of it without the type hell of earlier
>> versions (C++11 had most of it, but C++14 fills in some key missing pieces).
>>
>> However C++20 is really the main game changer in my opinion; although
>> earlier versions could play a lot of SFINAE hacks they also gave
>> absolutely useless barf as error messages. C++20 adds concepts, which
>> makes it possible to actually get reasonable errors.
>>
>> We do a lot of metaprogramming in the Linux kernel, implemented with
>> some often truly hideous macro hacks. These are also virtually
>> impossible to debug. Consider the uaccess.h type hacks, some of which I
>> designed and wrote. In C++, the various casts and case statements can be
>> unwound into separate template instances, and with some cleverness can
>> also strictly enforce things like user space vs kernel space pointers as
>> well as already-verified versus unverified user space pointers, not to
>> mention easily handle the case of 32-bit user space types in a 64-bit
>> kernel and make endianness conversion enforceable.
>>
>> Now, "why not Rust"? First of all, Rust uses a different (often, in my
>> opinion, gratuitously so) syntax, and not only would all the kernel
>> developers need to become intimately familiar to the level of getting
>> the same kind of "feel" as we have for C, but converting C code to Rust
>> isn't something that can be done piecemeal, whereas with some cleanups
>> the existing C code can be compiled as C++.
>>
>> However, I find that I disagree with some of David's conclusions; in
>> fact I believe David is unnecessarily *pessimistic* at least given
>> modern C++.
>>
>> Note that no one in their sane mind would expect to use all the features
>> of C++. Just like we have "kernel C" (currently a subset of C11 with a
>> relatively large set of allowed compiler-specific extensions) we would
>> have "kernel C++", which I would suggest to be a strictly defined subset
>> of C++20 combined with a similar set of compiler extensions.) I realize
>> C++20 compiler support is still very new for obvious reasons, so at
>> least some of this is forward looking.
>>
>> So, notes on this specific subset based on David's comments.
>>
>> On 4/1/18 13:40, David Howells wrote:
>> >
>> > Here are a series of patches to start converting the kernel to C++.  It
>> > requires g++ v8.
>> >
>> > What rocks:
>> >
>> >   (1) Inline template functions, which makes implementation of things like
>> >       cmpxchg() and get_user() much cleaner.
>>
>> Much, much cleaner indeed. But it also allows for introducing things
>> like inline patching of immediates *without* having to change literally
>> every instance of a variable.
>>
>> I wrote, in fact, such a patchset. It probably included the most awful
>> assembly hacks I have ever done, in order to implement the mechanics,
>> but what *really* made me give up on it was the fact that every site
>> where a patchable variable is invoked would have to be changed from, say:
>>
>>         foo = bar + some_init_offset;
>>
>> ... to ...
>>
>>         foo = imm_add(bar, some_init_offset);
>>
>>
>> >   (2) Inline overloaded functions, which makes implementation of things like
>> >       static_branch_likely() cleaner.
>>
>> Basically a subset of the above (it just means that for a specific set
>> of very common cases it isn't necessary to go all the way to using
>> templates, which makes the syntax nicer.)
>>
>> >   (3) Class inheritance.  For instance, all those inode wrappers that require
>> >       the base inode struct to be included and that has to be accessed with
>> >       something like:
>> >
>> >       inode->vfs_inode.i_mtime
>> >
>> >       when you could instead do:
>> >
>> >       inode->i_mtime
>>
>> This is nice, but it is fundamentally syntactic sugar. Similar things
>> can be done with anonymous structures, *except* that C doesn't allow
>> another structure to be anonymously included; you have to have an
>> entirely new "struct" statement defining all the fields. Welcome to
>> macro hell.
>>
>> > What I would disallow:
>> >
>> >   (1) new and delete.  There's no way to pass GFP_* flags in.
>>
>> Yes, there is.
>>
>> void * operator new (size_t count, gfp_flags_t flags);
>> void operator delete(void *ptr, ...whatever kfree/vfree/etc need, or a
>> suitable flag);
>>
>> >   (2) Constructors and destructors.  Nests of implicit code makes the code less
>> >       obvious, and the replacement of static initialisation with constructor
>> >       calls would make the code size larger.
>>
>> Yes and no. It also makes it *way* easier to convert to and from using
>> dedicated slabs; we already use semi-initialized slabs for some kinds of
>> objects, but it requires new code to make use of.
>>
>> We already *do* use constructors and *especially* destructors for a lot
>> of objects, we just call them out.
>>
>> Note that modern C++ also has the ability to construct and destruct
>> objects in-place, so allocation and construction/destruction aren't
>> necessarily related.
>>
>> There is no reason you can't do static initialization where possible;
>> even constructors can be evaluated at compile time if they are constexpr.
>>
>> Constructors (and destructors, for modules) in conjunction with gcc's
>> init_priority() extension is also a nice replacement for linker hack
>> tables to invoke intializer functions.
>>
>> >   (3) Exceptions and RTTI.  RTTI would bulk the kernel up too much and
>> >       exception handling is limited without it, and since destructors are not
>> >       allowed, you still have to manually clean up after an error.
>>
>> Agreed here, especially since on many platforms exception handling
>> relies on DWARF unwind information.
>
> Let me just add a few things about exceptions and RTTI.
> In the darwin kernel, C++ is used for device drivers and both
> exceptions and RTTI is not used there either. They have been using C++
> for kernel drivers since the early 2000s even.
> You can find out more at https://developer.apple.com/documentation/driverkit .
> There even was a GCC option added an option which would also disable
> RTTI and change the ABI to explicitly for the kernel.
> -fapple-kext/-mkernel (the former is for only loadable modules while
> the latter is for kernel too).
>
> Note even in GCC, we disable exceptions and RTTI while building GCC.
> This is specifically due to not wanting to use them and use other
> methods to do that.
> Note GDB on the other hand used to use setjmp/longjmp for their
> exception handling in C and I think they moved over to using C++
> exceptions which simplified things there. But as far as I know the
> Linux kernel does not use a mechanism like that (I know of copy
> from/to user using HW exceptions/error/interrupt handling but that is
> a special case only).
>
>
>>
>> >   (4) Operator overloading (except in special cases).
>>
>> See the example of inline patching above. But yes, overloading and
>> *especially* operator overloading should be used only with care; this is
>> pretty much true across the board.
>>
>> >   (5) Function overloading (except in special inline cases).
>>
>> I think we might find non-inline cases where it matters, too.
>>
>> >   (6) STL (though some type trait bits are needed to replace __builtins that
>> >       don't exist in g++).
>>
>> Just like there are parts of the C library which is really about the
>> compiler and not part of the library. <type_traits> is part of that for C++.
>
> There is an idea of a free standing C++ library. newer versions of
> GCC/libstdc++ does support that but IIRC can only be configured at
> compile time of GCC.
> type_traits and a few other headers are included in that. I have not
> looked into it fully though.

There is, and it's quite extensive (and I plan on extending it further
in GCC 15, if I get the chance to).  The full list of headers libstdc++
exports for freestanding use is a bit larger than the standard one:
https://gcc.gnu.org/cgit/gcc/tree/libstdc++-v3/include/Makefile.am#n28

(note that some are partially supported.. I lack a full list of which)

Most (actually, nearly all) of the libstdc++ code works for kernel
environments, and it is very mature and well-tested, so it can and
should be used by kernels too.  I haven't fully enabled using it in such
a manner yet, but

We could handle the kernel specific configuration via a multilib or so
(so, the multilib list becomes 32, 64, x32, and a new k64 or so on
amd64).  Presumably, something like that could be done for libgcc too?

It is not necessarily only configurable at build-time, but the libstdc++
configuration augmented by -ffreestanding and the one generated by a
'proper' freestanding build of libstdc++ differ currently.  Maybe they
can be brought together close enough for Linux?

Managarm, which is the kernel I had in mind when working on getting more
freestanding stuff has a dedicated kernel build of GCC, however, so I
didn't test this case much.  I'd like to, sooner or later, consolidate
it into the normal managarm system GCC, as a multilib, but I haven't had
time to do so yet.

In any case, I strongly prefer configuring toolchains 'properly'.

> Thanks,
> Andrew Pinski
>
>>
>> >   (7) 'class', 'private', 'namespace'.
>> >
>> >   (8) 'virtual'.  Don't want virtual base classes, though virtual function
>> >       tables might make operations tables more efficient.
>>
>> Operations tables *are* virtual classes. virtual base classes make sense
>> in a lot of cases, and we de facto use them already.
>>
>> However, Linux also does conversion of polymorphic objects from one type
>> to another -- that is for example how device nodes are implemented.
>> Using this with C++ polymorphism without RTTI does require some
>> compiler-specific hacks, unfortunately.
>>
>> > Issues:
>> >
>> >   (1) Need spaces inserting between strings and symbols.
>>
>> I have to admit I don't really grok this?
>>
>> >   (2) Direct assignment of pointers to/from void* isn't allowed by C++, though
>> >       g++ grudgingly permits it with -fpermissive.  I would imagine that a
>> >       compiler option could easily be added to hide the error entirely.
>>
>> Seriously. It should also enforce that it should be a trivial type.
>> Unfortunately it doesn't look like there is a way to create user-defined
>> implicit conversions from one pointer to another (via a helper class),
>> which otherwise would have had some other nice applications.
>>
>> >   (3) Need gcc v8+ to statically initialise an object of any struct that's not
>> >       really simple (e.g. if it's got an embedded union).
>>
>> Worst case: constexpr constructor.
>>
>> >   (4) Symbol length.  Really need to extern "C" everything to reduce the size
>> >       of the symbols stored in the kernel image.  This shouldn't be a problem
>> >       if out-of-line function overloading isn't permitted.
>>
>> This really would lose arguably the absolutely biggest advantage of C++:
>> type-safe linkage. This is the one reason why Linus actually tried to
>> use C++ in one single version of the kernel in the early days (0.99.14,
>> if I remember correctly.) At that time, g++ was nowhere near mature
>> enough, and it got dropped right away.
>>
>>
>> > So far, it gets as far as compiling init/main.c to a .o file.
>>
>> ;)


--
Arsen Arsenović