All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [RFC] Initialization of unused function parameters
@ 2022-06-14 14:48 Alexander Potapenko
  2022-06-14 16:48 ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Potapenko @ 2022-06-14 14:48 UTC (permalink / raw)
  To: glider
  Cc: Evgenii Stepanov, Kees Cook, Linus Torvalds, Marco Elver,
	Nathan Chancellor, Nick Desaulniers, Thomas Gleixner,
	Vitaly Buka, linux-kernel, linux-toolchains

Consider the patch below, where under certain circumstances
initialization of `unsigned seq` and `struct inode *inode` passed into
step_into() may be skipped.  In particular, if the call to lookup_fast()
in walk_component() returns NULL, and lookup_slow() returns a valid
dentry, then the `seq` and `inode` will remain uninitialized until the
call to step_into() (see [1] for more info).

Right now step_into() correctly handles this and does not use the
uninitialized values, but explicitly initializing them will guarantee
their sanity in the case of future changes to step_into().

The bigger question I want to raise here is whether we want to
discourage passing uninitialized variables to functions in the kernel
altogether. While this is not required by the C standard (see the whole
discussion at [2]), KMSAN can help with detecting cases where uninits
are passed to functions (there is a more aggressive
-fsanitize-memory-param-retval compiler option for that). This is
somewhat similar to -Wmaybe-uninitialized, but works dynamically and has
no false positives.

Would initializing function parameters be a net benefit to the kernel?

[1] https://github.com/ClangBuiltLinux/linux/issues/1648#issuecomment-1146608063
[2] https://github.com/ClangBuiltLinux/linux/issues/1648

Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Buka <vitalybuka@google.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-toolchains@vger.kernel.org
Signed-off-by: Alexander Potapenko <glider@google.com>
---
 fs/namei.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 509657fdf4f56..dde370338f5d6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2001,8 +2001,8 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
-	unsigned seq;
+	struct inode *inode = NULL;
+	unsigned seq = 0;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -3349,8 +3349,8 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
-	struct inode *inode;
+	unsigned seq = 0;
+	struct inode *inode = NULL;
 	struct dentry *dentry;
 	const char *res;
 
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 14:48 [PATCH] [RFC] Initialization of unused function parameters Alexander Potapenko
@ 2022-06-14 16:48 ` Linus Torvalds
  2022-06-14 17:11   ` Nick Desaulniers
  2022-06-14 18:07   ` Alexander Potapenko
  0 siblings, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2022-06-14 16:48 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor,
	Nick Desaulniers, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 7:49 AM Alexander Potapenko <glider@google.com> wrote:
>
> The bigger question I want to raise here is whether we want to
> discourage passing uninitialized variables to functions in the kernel
> altogether.

I'm assuming you mean pass by reference.

Some functions are really fundamentally about initializing things, and
expect uninitialized allocations.

Obviously the traditional example of this is "memset()", and that one
can be special-cased as obvious, but we probably have a ton of wrapper
things like that.

IOW, things like just "snprintf()" etc is fundamentally passed an
uninitialized buffer, because the whole point is that it will write to
that buffer.

And no, we don't want to initialize it, since the buffer may be big
(on purpose).

Now, for *small* things (like that "pointer to inode") that aren't
some kind of array or big structure, I think it might be good to
perhaps be stricter. But even there we do have cases where we pass
things by reference because the function is explicitly designed to
initialize the value: the argument isn't really "an argument", it's a
"second return value".

But always initializing in the caller sounds stupid and
counter-productive, since the point is to initialize by calling the
helper function (think things like returning a "cookie" or similar:
initializing the cookie to NULL in the caller is just plain _wrong_.

What I think might be a good model is to be able to mark such
arguments as "must be initialized by callee".

So then the rule could be that small arguments passed by reference
have to be either initialized by the caller, or the argument must have
that "initialized by callee" attribute, and then the initialization
would be enforced in the callee instead.

But making the rule be that the caller *always* has to initialize
sounds really wrong to me.

             Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 16:48 ` Linus Torvalds
@ 2022-06-14 17:11   ` Nick Desaulniers
  2022-06-14 17:24     ` Linus Torvalds
  2022-06-14 18:07   ` Alexander Potapenko
  1 sibling, 1 reply; 14+ messages in thread
From: Nick Desaulniers @ 2022-06-14 17:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Evgenii Stepanov, Kees Cook, Marco Elver,
	Nathan Chancellor, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 9:48 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jun 14, 2022 at 7:49 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > The bigger question I want to raise here is whether we want to
> > discourage passing uninitialized variables to functions in the kernel
> > altogether.
>
> I'm assuming you mean pass by reference.
>
> Some functions are really fundamentally about initializing things, and
> expect uninitialized allocations.
>
> Obviously the traditional example of this is "memset()", and that one
> can be special-cased as obvious, but we probably have a ton of wrapper
> things like that.
>
> IOW, things like just "snprintf()" etc is fundamentally passed an
> uninitialized buffer, because the whole point is that it will write to
> that buffer.
>
> And no, we don't want to initialize it, since the buffer may be big
> (on purpose).
>
> Now, for *small* things (like that "pointer to inode") that aren't
> some kind of array or big structure, I think it might be good to
> perhaps be stricter. But even there we do have cases where we pass
> things by reference because the function is explicitly designed to
> initialize the value: the argument isn't really "an argument", it's a
> "second return value".
>
> But always initializing in the caller sounds stupid and
> counter-productive, since the point is to initialize by calling the
> helper function (think things like returning a "cookie" or similar:
> initializing the cookie to NULL in the caller is just plain _wrong_.
>
> What I think might be a good model is to be able to mark such
> arguments as "must be initialized by callee".

Yeah, being able to enforce that would be nice.

Now that we have clang's static analyzer wired up (commit 6ad7cbc01527
("Makefile: Add clang-tidy and static analyzer support to makefile")),
Intel's 0day bot has been reporting cases it finds for some classes of
warnings.  There's been a few interesting (to me) cases where these
"init" routines would conditionally initialize a "second return value"
but the caller either did not do return value checking or the callee
was not marked __must_check (or both).

As with -Wsometimes-uninitialized, my experience has been that folks
consistently get error handling/exceptional cases wrong in so far as
passing unitialized values later.  Clang's -Wsometimes-uninitialized
is intra-proceedural, so doesn't catch the problems with "init"
routines. Clang's static analyzer is interproceedural; the trade off
being the time the analysis takes.

Maybe a new function parameter attribute would be nice?

#define __must_init __attribute__((must_init))
int init (int * __must_init x) {
// ^ warning: function parameter x marked '__attribute__((must_init))'
not unconditionally initialized
  if (stars_dont_align) {
    return -42;
  }
  *x = 42;
  return 0;
}
void foo (void) {  int x; init(&x); /* use of x without fear */ }


>
> So then the rule could be that small arguments passed by reference
> have to be either initialized by the caller, or the argument must have
> that "initialized by callee" attribute, and then the initialization
> would be enforced in the callee instead.
>
> But making the rule be that the caller *always* has to initialize
> sounds really wrong to me.
>
>              Linus



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 17:11   ` Nick Desaulniers
@ 2022-06-14 17:24     ` Linus Torvalds
  2022-06-14 18:08       ` Nick Desaulniers
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2022-06-14 17:24 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Alexander Potapenko, Evgenii Stepanov, Kees Cook, Marco Elver,
	Nathan Chancellor, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 10:11 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Maybe a new function parameter attribute would be nice?

Right, exactly something like this seems reasonable.

> #define __must_init __attribute__((must_init))
> int init (int * __must_init x) {
> // ^ warning: function parameter x marked '__attribute__((must_init))'
> not unconditionally initialized
>   if (stars_dont_align) {
>     return -42;
>   }
>   *x = 42;
>   return 0;
> }
> void foo (void) {  int x; init(&x); /* use of x without fear */ }

Yeah. So for this pattern of uninitialized pass-by-reference
arguments, we'd get the warning in the callee if it's __must_init, and
in the caller if it's not.

Now, I suspect that we have a lot of cases where the initializing
function returns an error, and we currently don't initialize the
pass-by-ref argument in that case.

In a perfect world, we'd have some way to annotate that case too, but
I suspect it gets too complicated both for users and for the compiler.

Error handling in C is ugly, but it's also why we in the kernel have
that ERR_PTR() model that solves the "return *both* an error *and* a
pointer" case. Which is one of the most common cases we have for this
situation.

I suspect that the simple "__must_init" model would work well enough
for us in practice. Yes, it might make us then initialize things
"unnecessarily" in error cases, but that doesn't sound too onerous.

And I think the "__must_init" model makes conceptual sense, in ways
that the "caller has to initialize things even if it is literally
asking another function to initialize the value" model does *not* make
sense.

But hey, I didn't look at just how painful it would really be. This is
all "I _think_ that would work really well for the kernel" without any
actual data to back it up with.

                Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 16:48 ` Linus Torvalds
  2022-06-14 17:11   ` Nick Desaulniers
@ 2022-06-14 18:07   ` Alexander Potapenko
  2022-06-14 18:30     ` Linus Torvalds
  1 sibling, 1 reply; 14+ messages in thread
From: Alexander Potapenko @ 2022-06-14 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor,
	Nick Desaulniers, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 6:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jun 14, 2022 at 7:49 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > The bigger question I want to raise here is whether we want to
> > discourage passing uninitialized variables to functions in the kernel
> > altogether.
>
> I'm assuming you mean pass by reference.

No, sorry for being unclear. I mean passing by value.
In the given example the prototype of step_into looks as follows (see
https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1846):

  static const char *step_into(struct nameidata *nd, int flags, struct
dentry *dentry, struct inode *inode, unsigned seq);

, and the local variables `struct inode *inode` and `unsigned seq` are
being passed to it by value, i.e. in certain cases the struct inode
pointer and the unsigned seq are uninitialized.

Does that change anything?

> Some functions are really fundamentally about initializing things, and
> expect uninitialized allocations.

Agreed, there are a lot of functions around that initialize one struct
or another, they are out of the scope.


> What I think might be a good model is to be able to mark such
> arguments as "must be initialized by callee".

This sounds interesting. In the given example I would suggest that the
call to lookup_fast()
(https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L2016)
should be initializing &inode and &seq, so that it is guaranteed that
they are passed initialized into step_into().


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 17:24     ` Linus Torvalds
@ 2022-06-14 18:08       ` Nick Desaulniers
  2022-06-14 22:27         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Desaulniers @ 2022-06-14 18:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Evgenii Stepanov, Kees Cook, Marco Elver,
	Nathan Chancellor, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 10:24 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jun 14, 2022 at 10:11 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > Maybe a new function parameter attribute would be nice?
>
> Right, exactly something like this seems reasonable.
>
> > #define __must_init __attribute__((must_init))
> > int init (int * __must_init x) {
> > // ^ warning: function parameter x marked '__attribute__((must_init))'
> > not unconditionally initialized
> >   if (stars_dont_align) {
> >     return -42;
> >   }
> >   *x = 42;
> >   return 0;
> > }
> > void foo (void) {  int x; init(&x); /* use of x without fear */ }

Thinking more about your snprintf example which is potentially more
costly than initializing just one value...here's another case for us
to consider.

int maybe_init (char* buf) {
  if (stars_align)
    return -42;
  buf[42] = 0;
  return 0;
}

char foo (void) {
  char buf [PATH_MAX];
  maybe_init(buf);
  return char[42];
}

I'm thinking the attribute would have to go on the pointed to type not
the pointer, i.e. `__must_init char *` not `char * __must_init`.
Similarly, you'd need to unconditionally initialize all of buf which
might be painful.  __must_init would not give you the specificity to
differentiate between "this whole buffer must be initialized" vs "only
index 42 need be initialized."  I _think_ that's fine, perhaps "only
index 42 need be initialized" is YAGNI and you could just _not_ use
__must_init for such a case.

One thing I'm curious about; if you have an aggregate in C (struct or
array) and don't fully initialize the whole object, just
members/sub-objects, but only use those I assume that's not UB? (Which
is what my maybe_init example does).  I think that's fine.

Another thing that makes me uncertain about my maybe_init example
above is decay-to-pointer, and the compiler's ability to track things
like __builtin_object_size precisely (or across translation units);
Kees is having a dog of a time with __builtin_object_size of structs
that contain flexible array members for example.  If a function
accepts a `__must_init struct foo*`, can we know if we were passed an
array of struct foo* vs just one? What if `struct foo` is an opaque
type; then the callee can't verify that all members of the struct have
been initialized (at least designated initialized wouldn't work;
memcpy should though...can you get the sizeof an opaque type though?)

But maybe I'm getting ahead of myself and am just describing good unit
tests when building such a feature.  Probably worth prototyping to get
a sense of the ergonomics and limitations.  But it doesn't sound too
controversial, which is probably good enough to get started. I'm sure
Alexander and team will have additional ideas or proposals for
achieving the same outcome.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 18:07   ` Alexander Potapenko
@ 2022-06-14 18:30     ` Linus Torvalds
  2022-06-14 20:19       ` Alexander Potapenko
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2022-06-14 18:30 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor,
	Nick Desaulniers, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 11:08 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Tue, Jun 14, 2022 at 6:48 PM Linus Torvalds
> >
> > I'm assuming you mean pass by reference.
>
> No, sorry for being unclear. I mean passing by value.

Pass-by-value most definitely should warn about uninitialized variables.

> In the given example the prototype of step_into looks as follows (see
> https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1846):
>
>   static const char *step_into(struct nameidata *nd, int flags, struct
> dentry *dentry, struct inode *inode, unsigned seq);
>
> , and the local variables `struct inode *inode` and `unsigned seq` are
> being passed to it by value, i.e. in certain cases the struct inode
> pointer and the unsigned seq are uninitialized.

Then those cases should warn. No question about it.

I assume the only reason they don't warn right now is that the
compiler doesn't see that they are uninitialized, possibly due to some
earlier pass-by-reference use.

             Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 18:30     ` Linus Torvalds
@ 2022-06-14 20:19       ` Alexander Potapenko
  2022-06-14 20:43         ` Linus Torvalds
  2022-06-14 21:40         ` Segher Boessenkool
  0 siblings, 2 replies; 14+ messages in thread
From: Alexander Potapenko @ 2022-06-14 20:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor,
	Nick Desaulniers, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 8:31 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jun 14, 2022 at 11:08 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 6:48 PM Linus Torvalds
> > >
> > > I'm assuming you mean pass by reference.
> >
> > No, sorry for being unclear. I mean passing by value.
>
> Pass-by-value most definitely should warn about uninitialized variables.
>
> > In the given example the prototype of step_into looks as follows (see
> > https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1846):
> >
> >   static const char *step_into(struct nameidata *nd, int flags, struct
> > dentry *dentry, struct inode *inode, unsigned seq);
> >
> > , and the local variables `struct inode *inode` and `unsigned seq` are
> > being passed to it by value, i.e. in certain cases the struct inode
> > pointer and the unsigned seq are uninitialized.
>
> Then those cases should warn. No question about it.

What about the cases where these uninitialized values are never used
in the callee?
step_into() is one of the instances from the kernel, but here is a
distilled example from https://godbolt.org/z/s1oPve6d4:

================
char *kmalloc(int size);

char *kmalloc_or_not(int flag, int size, char *p) {
  if (flag)
    return kmalloc(size);
  else
    return p;
}

char global[16];

char *p(int flag) {
  char *c;
  int size;
  if (flag)
    return kmalloc_or_not(1, 4, c);
  else
    return kmalloc_or_not(0, size, global);
}
================

In this example `size` is passed into kmalloc_or_not() initialized,
however it is never used, so the code probably has defined behavior.
In this particular case Clang's -Winitialized is able to notice that
`size` is uninitialized, but in more complex cases it cannot.

> I assume the only reason they don't warn right now is that the
> compiler doesn't see that they are uninitialized, possibly due to some
> earlier pass-by-reference use.

That's right, and here is where dynamic analysis comes to the rescue.
So should we let KMSAN catch such cases and consider them bugs^W
smelly code patterns that need to be fixed?

>
>              Linus



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 20:19       ` Alexander Potapenko
@ 2022-06-14 20:43         ` Linus Torvalds
  2022-06-14 21:40         ` Segher Boessenkool
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2022-06-14 20:43 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor,
	Nick Desaulniers, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 1:20 PM Alexander Potapenko <glider@google.com> wrote:
>
> What about the cases where these uninitialized values are never used
> in the callee?

I assume that what happens is that when things are inlined, the
compiler then sees that there is no actual uninitialized value, and
that's ok.

But if things aren't inlined, I really hope all compilers already warn
about "look, I'm calling this function with an uninitialized
argument".

IOW, compilers can - and should - obviously take more information into
account when they can see it.

So no, don't warn for things you can actually see are not used.

IOW, you shouldn't warn because of any _syntactic_ issue of it being
an argument to a function. We often use inlining as an actually
semantically meaningful thing, and the compiler should *not* warn for
some theoretical "if this was not inlined, the argument would be used
and be uninitialized" case.

For an example of this kind of "not really used" thing, I could
imagine that some configuration might need a "cookie" model to pair up
actions, and you have a

        void *cookie;

        start(arg, &cookie);
        ....
        end(cookie);

kind of situation.

But then I could imagine that other configurations don't actually need
or use that "end()" thing at all, and would leave "cookie"
uninitialized, because the only valid use would be an inline function
that is empty, and purely there for those *other* configurations.

Again, if the compiler inlines 'end()', and sees that 'cookie' is not
actually used, then no complaint is needed - or valid.

But if 'cookie()' is an actual real function call, and you don't see
the use of it, then it had better warn.

No?

               Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 20:19       ` Alexander Potapenko
  2022-06-14 20:43         ` Linus Torvalds
@ 2022-06-14 21:40         ` Segher Boessenkool
  2022-06-14 22:08           ` Evgenii Stepanov
  2022-06-15  8:30           ` Alexander Potapenko
  1 sibling, 2 replies; 14+ messages in thread
From: Segher Boessenkool @ 2022-06-14 21:40 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Evgenii Stepanov, Kees Cook, Marco Elver,
	Nathan Chancellor, Nick Desaulniers, Thomas Gleixner,
	Vitaly Buka, Linux Kernel Mailing List, linux-toolchains

Hi!

On Tue, Jun 14, 2022 at 10:19:53PM +0200, Alexander Potapenko wrote:
> ================
> char *kmalloc(int size);
> 
> char *kmalloc_or_not(int flag, int size, char *p) {
>   if (flag)
>     return kmalloc(size);
>   else
>     return p;
> }
> 
> char global[16];
> 
> char *p(int flag) {
>   char *c;
>   int size;
>   if (flag)
>     return kmalloc_or_not(1, 4, c);
>   else
>     return kmalloc_or_not(0, size, global);
> }
> ================

Since C11, lvalue conversion of an automatic variable that does not have
its address taken is explicitly undefined behaviour (6.3.2.1/2).  So in
function "p", both where "c" and where "size" are passed causes UB (so
that executing "p" always causes UB btw).

> In this example `size` is passed into kmalloc_or_not() initialized,
> however it is never used, so the code probably has defined behavior.

No such luck: the passing itself already causes UB.

GCC does not warn, it has already optimised the code to what you expect
by the time this warning is done.  If you use -fno-inline it does warn
for both "c" and "size" (via -Wmaybe-uninitialized).

But it is still UB!  All bets are off, no compiler can do any correct
translation of your program, since there *is none*.


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 21:40         ` Segher Boessenkool
@ 2022-06-14 22:08           ` Evgenii Stepanov
  2022-06-15  8:30           ` Alexander Potapenko
  1 sibling, 0 replies; 14+ messages in thread
From: Evgenii Stepanov @ 2022-06-14 22:08 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Alexander Potapenko, Linus Torvalds, Kees Cook, Marco Elver,
	Nathan Chancellor, Nick Desaulniers, Thomas Gleixner,
	Vitaly Buka, Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 2:45 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> No such luck: the passing itself already causes UB.
>
> GCC does not warn, it has already optimised the code to what you expect
> by the time this warning is done.  If you use -fno-inline it does warn
> for both "c" and "size" (via -Wmaybe-uninitialized).
>
> But it is still UB!  All bets are off, no compiler can do any correct
> translation of your program, since there *is none*.

Clang also puts "noundef" attribute on most function arguments in the
frontend, and the optimizer can assume that the inputs are fully
initialized. The diagnostic for this is very imperfect.

What Alex is proposing (the KMSAN option) takes place *after*
inlining, so it will only enforce the "semantically meaningful" case
in Linus's words.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 18:08       ` Nick Desaulniers
@ 2022-06-14 22:27         ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2022-06-14 22:27 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Linus Torvalds, Alexander Potapenko, Evgenii Stepanov, Kees Cook,
	Marco Elver, Nathan Chancellor, Thomas Gleixner, Vitaly Buka,
	Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 11:08:33AM -0700, Nick Desaulniers wrote:

> One thing I'm curious about; if you have an aggregate in C (struct or
> array) and don't fully initialize the whole object, just
> members/sub-objects, but only use those I assume that's not UB? (Which
> is what my maybe_init example does).  I think that's fine.

Perf does something like that. We only explicitly initialize the first
cacheline of a structure because that's *always* used. Any other field
will only be initialized on request.

At the time (and i've not benchmarked this in a good few years) that was
a significant performance win.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-14 21:40         ` Segher Boessenkool
  2022-06-14 22:08           ` Evgenii Stepanov
@ 2022-06-15  8:30           ` Alexander Potapenko
  2022-06-15 16:46             ` Segher Boessenkool
  1 sibling, 1 reply; 14+ messages in thread
From: Alexander Potapenko @ 2022-06-15  8:30 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Linus Torvalds, Evgenii Stepanov, Kees Cook, Marco Elver,
	Nathan Chancellor, Nick Desaulniers, Thomas Gleixner,
	Vitaly Buka, Linux Kernel Mailing List, linux-toolchains

On Tue, Jun 14, 2022 at 11:45 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Tue, Jun 14, 2022 at 10:19:53PM +0200, Alexander Potapenko wrote:
> > ================
> > char *kmalloc(int size);
> >
> > char *kmalloc_or_not(int flag, int size, char *p) {
> >   if (flag)
> >     return kmalloc(size);
> >   else
> >     return p;
> > }
> >
> > char global[16];
> >
> > char *p(int flag) {
> >   char *c;
> >   int size;
> >   if (flag)
> >     return kmalloc_or_not(1, 4, c);
> >   else
> >     return kmalloc_or_not(0, size, global);
> > }
> > ================
>
> Since C11, lvalue conversion of an automatic variable that does not have
> its address taken is explicitly undefined behaviour (6.3.2.1/2).  So in
> function "p", both where "c" and where "size" are passed causes UB (so
> that executing "p" always causes UB btw).

Thanks for this reference to the standard. I've received another one
off-list, which lets the variables be address-taken:

6.2.4/6: "If an initialization is specified for the object, it is
performed each time the declaration or compound literal is reached in
the execution of the block; otherwise, the value becomes indeterminate
each time the declaration is reached."
3.19.2/1: "indeterminate value: either an unspecified value or a trap
representation"
6.2.6.1/5: "Certain object representations need not represent a value
of the object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined. [Footnote: Thus, an automatic variable can be
initialized to a trap representation without causing undefined
behavior, but the value of the variable cannot be used until a proper
value is stored in it.] Such a representation is called a trap
representation."


> > In this example `size` is passed into kmalloc_or_not() initialized,
> > however it is never used, so the code probably has defined behavior.
>
> No such luck: the passing itself already causes UB.

Looks like I've been missing this fact all the time.

> GCC does not warn, it has already optimised the code to what you expect
> by the time this warning is done.  If you use -fno-inline it does warn
> for both "c" and "size" (via -Wmaybe-uninitialized).
>
> But it is still UB!  All bets are off, no compiler can do any correct
> translation of your program, since there *is none*.

Then it makes sense for us to report non-trivial cases where
uninitialized values are actually passed to functions.
As Evgenii mentions, trivial inlinable cases are optimized away before
KMSAN instrumentation kicks in, so we won't be reporting them.

>
> Segher



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [RFC] Initialization of unused function parameters
  2022-06-15  8:30           ` Alexander Potapenko
@ 2022-06-15 16:46             ` Segher Boessenkool
  0 siblings, 0 replies; 14+ messages in thread
From: Segher Boessenkool @ 2022-06-15 16:46 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Evgenii Stepanov, Kees Cook, Marco Elver,
	Nathan Chancellor, Nick Desaulniers, Thomas Gleixner,
	Vitaly Buka, Linux Kernel Mailing List, linux-toolchains

Hi!

On Wed, Jun 15, 2022 at 10:30:17AM +0200, Alexander Potapenko wrote:
> On Tue, Jun 14, 2022 at 11:45 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > Since C11, lvalue conversion of an automatic variable that does not have
> > its address taken is explicitly undefined behaviour (6.3.2.1/2).  So in
> > function "p", both where "c" and where "size" are passed causes UB (so
> > that executing "p" always causes UB btw).
> 
> Thanks for this reference to the standard. I've received another one
> off-list, which lets the variables be address-taken:
> 
> 6.2.4/6: "If an initialization is specified for the object, it is
> performed each time the declaration or compound literal is reached in
> the execution of the block; otherwise, the value becomes indeterminate
> each time the declaration is reached."
> 3.19.2/1: "indeterminate value: either an unspecified value or a trap
> representation"
> 6.2.6.1/5: "Certain object representations need not represent a value
> of the object type. If the stored value of an object has such a
> representation and is read by an lvalue expression that does not have
> character type, the behavior is undefined. If such a representation is
> produced by a side effect that modifies all or any part of the object
> by an lvalue expression that does not have character type, the
> behavior is undefined. [Footnote: Thus, an automatic variable can be
> initialized to a trap representation without causing undefined
> behavior, but the value of the variable cannot be used until a proper
> value is stored in it.] Such a representation is called a trap
> representation."

That only affects types that have trap representations though, which
importantly explicitly does not include unsigned types without padding
bits, and unsigned char always.

Some people (on the standards committee) think all uninitialised uses
should be UB.  And some think not.  But since C11 we have this new
explicit UB for automatic variables that don't have their address taken.

(The background of that is the Itanium NaT (not-a-thing) bit in its
registers; without this new clause compilers for Itanium needed to
initialise many things that they now do not, with some readings of the
standard anyway :-) )

> > GCC does not warn, it has already optimised the code to what you expect
> > by the time this warning is done.  If you use -fno-inline it does warn
> > for both "c" and "size" (via -Wmaybe-uninitialized).
> >
> > But it is still UB!  All bets are off, no compiler can do any correct
> > translation of your program, since there *is none*.
> 
> Then it makes sense for us to report non-trivial cases where
> uninitialized values are actually passed to functions.

Yes, and IMO it makes sense to report this to compilers as well, if they
do not yet warn for some simple cases.

Cheers,


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-15 16:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14 14:48 [PATCH] [RFC] Initialization of unused function parameters Alexander Potapenko
2022-06-14 16:48 ` Linus Torvalds
2022-06-14 17:11   ` Nick Desaulniers
2022-06-14 17:24     ` Linus Torvalds
2022-06-14 18:08       ` Nick Desaulniers
2022-06-14 22:27         ` Peter Zijlstra
2022-06-14 18:07   ` Alexander Potapenko
2022-06-14 18:30     ` Linus Torvalds
2022-06-14 20:19       ` Alexander Potapenko
2022-06-14 20:43         ` Linus Torvalds
2022-06-14 21:40         ` Segher Boessenkool
2022-06-14 22:08           ` Evgenii Stepanov
2022-06-15  8:30           ` Alexander Potapenko
2022-06-15 16:46             ` Segher Boessenkool

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.