* [PATCH] [RFC] Initialization of unused function parameters @ 2022-06-14 14:48 Alexander Potapenko 2022-06-14 16:48 ` Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Alexander Potapenko @ 2022-06-14 14:48 UTC (permalink / raw) To: glider Cc: Evgenii Stepanov, Kees Cook, Linus Torvalds, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, linux-kernel, linux-toolchains Consider the patch below, where under certain circumstances initialization of `unsigned seq` and `struct inode *inode` passed into step_into() may be skipped. In particular, if the call to lookup_fast() in walk_component() returns NULL, and lookup_slow() returns a valid dentry, then the `seq` and `inode` will remain uninitialized until the call to step_into() (see [1] for more info). Right now step_into() correctly handles this and does not use the uninitialized values, but explicitly initializing them will guarantee their sanity in the case of future changes to step_into(). The bigger question I want to raise here is whether we want to discourage passing uninitialized variables to functions in the kernel altogether. While this is not required by the C standard (see the whole discussion at [2]), KMSAN can help with detecting cases where uninits are passed to functions (there is a more aggressive -fsanitize-memory-param-retval compiler option for that). This is somewhat similar to -Wmaybe-uninitialized, but works dynamically and has no false positives. Would initializing function parameters be a net benefit to the kernel? [1] https://github.com/ClangBuiltLinux/linux/issues/1648#issuecomment-1146608063 [2] https://github.com/ClangBuiltLinux/linux/issues/1648 Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Buka <vitalybuka@google.com> Cc: linux-kernel@vger.kernel.org Cc: linux-toolchains@vger.kernel.org Signed-off-by: Alexander Potapenko <glider@google.com> --- fs/namei.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 509657fdf4f56..dde370338f5d6 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2001,8 +2001,8 @@ static const char *handle_dots(struct nameidata *nd, int type) static const char *walk_component(struct nameidata *nd, int flags) { struct dentry *dentry; - struct inode *inode; - unsigned seq; + struct inode *inode = NULL; + unsigned seq = 0; /* * "." and ".." are special - ".." especially so because it has * to be able to know about the current root directory and @@ -3349,8 +3349,8 @@ static const char *open_last_lookups(struct nameidata *nd, struct dentry *dir = nd->path.dentry; int open_flag = op->open_flag; bool got_write = false; - unsigned seq; - struct inode *inode; + unsigned seq = 0; + struct inode *inode = NULL; struct dentry *dentry; const char *res; -- 2.36.1.476.g0c4daa206d-goog ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 14:48 [PATCH] [RFC] Initialization of unused function parameters Alexander Potapenko @ 2022-06-14 16:48 ` Linus Torvalds 2022-06-14 17:11 ` Nick Desaulniers 2022-06-14 18:07 ` Alexander Potapenko 0 siblings, 2 replies; 14+ messages in thread From: Linus Torvalds @ 2022-06-14 16:48 UTC (permalink / raw) To: Alexander Potapenko Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 7:49 AM Alexander Potapenko <glider@google.com> wrote: > > The bigger question I want to raise here is whether we want to > discourage passing uninitialized variables to functions in the kernel > altogether. I'm assuming you mean pass by reference. Some functions are really fundamentally about initializing things, and expect uninitialized allocations. Obviously the traditional example of this is "memset()", and that one can be special-cased as obvious, but we probably have a ton of wrapper things like that. IOW, things like just "snprintf()" etc is fundamentally passed an uninitialized buffer, because the whole point is that it will write to that buffer. And no, we don't want to initialize it, since the buffer may be big (on purpose). Now, for *small* things (like that "pointer to inode") that aren't some kind of array or big structure, I think it might be good to perhaps be stricter. But even there we do have cases where we pass things by reference because the function is explicitly designed to initialize the value: the argument isn't really "an argument", it's a "second return value". But always initializing in the caller sounds stupid and counter-productive, since the point is to initialize by calling the helper function (think things like returning a "cookie" or similar: initializing the cookie to NULL in the caller is just plain _wrong_. What I think might be a good model is to be able to mark such arguments as "must be initialized by callee". So then the rule could be that small arguments passed by reference have to be either initialized by the caller, or the argument must have that "initialized by callee" attribute, and then the initialization would be enforced in the callee instead. But making the rule be that the caller *always* has to initialize sounds really wrong to me. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 16:48 ` Linus Torvalds @ 2022-06-14 17:11 ` Nick Desaulniers 2022-06-14 17:24 ` Linus Torvalds 2022-06-14 18:07 ` Alexander Potapenko 1 sibling, 1 reply; 14+ messages in thread From: Nick Desaulniers @ 2022-06-14 17:11 UTC (permalink / raw) To: Linus Torvalds Cc: Alexander Potapenko, Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 9:48 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, Jun 14, 2022 at 7:49 AM Alexander Potapenko <glider@google.com> wrote: > > > > The bigger question I want to raise here is whether we want to > > discourage passing uninitialized variables to functions in the kernel > > altogether. > > I'm assuming you mean pass by reference. > > Some functions are really fundamentally about initializing things, and > expect uninitialized allocations. > > Obviously the traditional example of this is "memset()", and that one > can be special-cased as obvious, but we probably have a ton of wrapper > things like that. > > IOW, things like just "snprintf()" etc is fundamentally passed an > uninitialized buffer, because the whole point is that it will write to > that buffer. > > And no, we don't want to initialize it, since the buffer may be big > (on purpose). > > Now, for *small* things (like that "pointer to inode") that aren't > some kind of array or big structure, I think it might be good to > perhaps be stricter. But even there we do have cases where we pass > things by reference because the function is explicitly designed to > initialize the value: the argument isn't really "an argument", it's a > "second return value". > > But always initializing in the caller sounds stupid and > counter-productive, since the point is to initialize by calling the > helper function (think things like returning a "cookie" or similar: > initializing the cookie to NULL in the caller is just plain _wrong_. > > What I think might be a good model is to be able to mark such > arguments as "must be initialized by callee". Yeah, being able to enforce that would be nice. Now that we have clang's static analyzer wired up (commit 6ad7cbc01527 ("Makefile: Add clang-tidy and static analyzer support to makefile")), Intel's 0day bot has been reporting cases it finds for some classes of warnings. There's been a few interesting (to me) cases where these "init" routines would conditionally initialize a "second return value" but the caller either did not do return value checking or the callee was not marked __must_check (or both). As with -Wsometimes-uninitialized, my experience has been that folks consistently get error handling/exceptional cases wrong in so far as passing unitialized values later. Clang's -Wsometimes-uninitialized is intra-proceedural, so doesn't catch the problems with "init" routines. Clang's static analyzer is interproceedural; the trade off being the time the analysis takes. Maybe a new function parameter attribute would be nice? #define __must_init __attribute__((must_init)) int init (int * __must_init x) { // ^ warning: function parameter x marked '__attribute__((must_init))' not unconditionally initialized if (stars_dont_align) { return -42; } *x = 42; return 0; } void foo (void) { int x; init(&x); /* use of x without fear */ } > > So then the rule could be that small arguments passed by reference > have to be either initialized by the caller, or the argument must have > that "initialized by callee" attribute, and then the initialization > would be enforced in the callee instead. > > But making the rule be that the caller *always* has to initialize > sounds really wrong to me. > > Linus -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 17:11 ` Nick Desaulniers @ 2022-06-14 17:24 ` Linus Torvalds 2022-06-14 18:08 ` Nick Desaulniers 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2022-06-14 17:24 UTC (permalink / raw) To: Nick Desaulniers Cc: Alexander Potapenko, Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 10:11 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > Maybe a new function parameter attribute would be nice? Right, exactly something like this seems reasonable. > #define __must_init __attribute__((must_init)) > int init (int * __must_init x) { > // ^ warning: function parameter x marked '__attribute__((must_init))' > not unconditionally initialized > if (stars_dont_align) { > return -42; > } > *x = 42; > return 0; > } > void foo (void) { int x; init(&x); /* use of x without fear */ } Yeah. So for this pattern of uninitialized pass-by-reference arguments, we'd get the warning in the callee if it's __must_init, and in the caller if it's not. Now, I suspect that we have a lot of cases where the initializing function returns an error, and we currently don't initialize the pass-by-ref argument in that case. In a perfect world, we'd have some way to annotate that case too, but I suspect it gets too complicated both for users and for the compiler. Error handling in C is ugly, but it's also why we in the kernel have that ERR_PTR() model that solves the "return *both* an error *and* a pointer" case. Which is one of the most common cases we have for this situation. I suspect that the simple "__must_init" model would work well enough for us in practice. Yes, it might make us then initialize things "unnecessarily" in error cases, but that doesn't sound too onerous. And I think the "__must_init" model makes conceptual sense, in ways that the "caller has to initialize things even if it is literally asking another function to initialize the value" model does *not* make sense. But hey, I didn't look at just how painful it would really be. This is all "I _think_ that would work really well for the kernel" without any actual data to back it up with. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 17:24 ` Linus Torvalds @ 2022-06-14 18:08 ` Nick Desaulniers 2022-06-14 22:27 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Nick Desaulniers @ 2022-06-14 18:08 UTC (permalink / raw) To: Linus Torvalds Cc: Alexander Potapenko, Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 10:24 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, Jun 14, 2022 at 10:11 AM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > Maybe a new function parameter attribute would be nice? > > Right, exactly something like this seems reasonable. > > > #define __must_init __attribute__((must_init)) > > int init (int * __must_init x) { > > // ^ warning: function parameter x marked '__attribute__((must_init))' > > not unconditionally initialized > > if (stars_dont_align) { > > return -42; > > } > > *x = 42; > > return 0; > > } > > void foo (void) { int x; init(&x); /* use of x without fear */ } Thinking more about your snprintf example which is potentially more costly than initializing just one value...here's another case for us to consider. int maybe_init (char* buf) { if (stars_align) return -42; buf[42] = 0; return 0; } char foo (void) { char buf [PATH_MAX]; maybe_init(buf); return char[42]; } I'm thinking the attribute would have to go on the pointed to type not the pointer, i.e. `__must_init char *` not `char * __must_init`. Similarly, you'd need to unconditionally initialize all of buf which might be painful. __must_init would not give you the specificity to differentiate between "this whole buffer must be initialized" vs "only index 42 need be initialized." I _think_ that's fine, perhaps "only index 42 need be initialized" is YAGNI and you could just _not_ use __must_init for such a case. One thing I'm curious about; if you have an aggregate in C (struct or array) and don't fully initialize the whole object, just members/sub-objects, but only use those I assume that's not UB? (Which is what my maybe_init example does). I think that's fine. Another thing that makes me uncertain about my maybe_init example above is decay-to-pointer, and the compiler's ability to track things like __builtin_object_size precisely (or across translation units); Kees is having a dog of a time with __builtin_object_size of structs that contain flexible array members for example. If a function accepts a `__must_init struct foo*`, can we know if we were passed an array of struct foo* vs just one? What if `struct foo` is an opaque type; then the callee can't verify that all members of the struct have been initialized (at least designated initialized wouldn't work; memcpy should though...can you get the sizeof an opaque type though?) But maybe I'm getting ahead of myself and am just describing good unit tests when building such a feature. Probably worth prototyping to get a sense of the ergonomics and limitations. But it doesn't sound too controversial, which is probably good enough to get started. I'm sure Alexander and team will have additional ideas or proposals for achieving the same outcome. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 18:08 ` Nick Desaulniers @ 2022-06-14 22:27 ` Peter Zijlstra 0 siblings, 0 replies; 14+ messages in thread From: Peter Zijlstra @ 2022-06-14 22:27 UTC (permalink / raw) To: Nick Desaulniers Cc: Linus Torvalds, Alexander Potapenko, Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 11:08:33AM -0700, Nick Desaulniers wrote: > One thing I'm curious about; if you have an aggregate in C (struct or > array) and don't fully initialize the whole object, just > members/sub-objects, but only use those I assume that's not UB? (Which > is what my maybe_init example does). I think that's fine. Perf does something like that. We only explicitly initialize the first cacheline of a structure because that's *always* used. Any other field will only be initialized on request. At the time (and i've not benchmarked this in a good few years) that was a significant performance win. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 16:48 ` Linus Torvalds 2022-06-14 17:11 ` Nick Desaulniers @ 2022-06-14 18:07 ` Alexander Potapenko 2022-06-14 18:30 ` Linus Torvalds 1 sibling, 1 reply; 14+ messages in thread From: Alexander Potapenko @ 2022-06-14 18:07 UTC (permalink / raw) To: Linus Torvalds Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 6:48 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, Jun 14, 2022 at 7:49 AM Alexander Potapenko <glider@google.com> wrote: > > > > The bigger question I want to raise here is whether we want to > > discourage passing uninitialized variables to functions in the kernel > > altogether. > > I'm assuming you mean pass by reference. No, sorry for being unclear. I mean passing by value. In the given example the prototype of step_into looks as follows (see https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1846): static const char *step_into(struct nameidata *nd, int flags, struct dentry *dentry, struct inode *inode, unsigned seq); , and the local variables `struct inode *inode` and `unsigned seq` are being passed to it by value, i.e. in certain cases the struct inode pointer and the unsigned seq are uninitialized. Does that change anything? > Some functions are really fundamentally about initializing things, and > expect uninitialized allocations. Agreed, there are a lot of functions around that initialize one struct or another, they are out of the scope. > What I think might be a good model is to be able to mark such > arguments as "must be initialized by callee". This sounds interesting. In the given example I would suggest that the call to lookup_fast() (https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L2016) should be initializing &inode and &seq, so that it is guaranteed that they are passed initialized into step_into(). -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 18:07 ` Alexander Potapenko @ 2022-06-14 18:30 ` Linus Torvalds 2022-06-14 20:19 ` Alexander Potapenko 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2022-06-14 18:30 UTC (permalink / raw) To: Alexander Potapenko Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 11:08 AM Alexander Potapenko <glider@google.com> wrote: > > On Tue, Jun 14, 2022 at 6:48 PM Linus Torvalds > > > > I'm assuming you mean pass by reference. > > No, sorry for being unclear. I mean passing by value. Pass-by-value most definitely should warn about uninitialized variables. > In the given example the prototype of step_into looks as follows (see > https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1846): > > static const char *step_into(struct nameidata *nd, int flags, struct > dentry *dentry, struct inode *inode, unsigned seq); > > , and the local variables `struct inode *inode` and `unsigned seq` are > being passed to it by value, i.e. in certain cases the struct inode > pointer and the unsigned seq are uninitialized. Then those cases should warn. No question about it. I assume the only reason they don't warn right now is that the compiler doesn't see that they are uninitialized, possibly due to some earlier pass-by-reference use. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 18:30 ` Linus Torvalds @ 2022-06-14 20:19 ` Alexander Potapenko 2022-06-14 20:43 ` Linus Torvalds 2022-06-14 21:40 ` Segher Boessenkool 0 siblings, 2 replies; 14+ messages in thread From: Alexander Potapenko @ 2022-06-14 20:19 UTC (permalink / raw) To: Linus Torvalds Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 8:31 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, Jun 14, 2022 at 11:08 AM Alexander Potapenko <glider@google.com> wrote: > > > > On Tue, Jun 14, 2022 at 6:48 PM Linus Torvalds > > > > > > I'm assuming you mean pass by reference. > > > > No, sorry for being unclear. I mean passing by value. > > Pass-by-value most definitely should warn about uninitialized variables. > > > In the given example the prototype of step_into looks as follows (see > > https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1846): > > > > static const char *step_into(struct nameidata *nd, int flags, struct > > dentry *dentry, struct inode *inode, unsigned seq); > > > > , and the local variables `struct inode *inode` and `unsigned seq` are > > being passed to it by value, i.e. in certain cases the struct inode > > pointer and the unsigned seq are uninitialized. > > Then those cases should warn. No question about it. What about the cases where these uninitialized values are never used in the callee? step_into() is one of the instances from the kernel, but here is a distilled example from https://godbolt.org/z/s1oPve6d4: ================ char *kmalloc(int size); char *kmalloc_or_not(int flag, int size, char *p) { if (flag) return kmalloc(size); else return p; } char global[16]; char *p(int flag) { char *c; int size; if (flag) return kmalloc_or_not(1, 4, c); else return kmalloc_or_not(0, size, global); } ================ In this example `size` is passed into kmalloc_or_not() initialized, however it is never used, so the code probably has defined behavior. In this particular case Clang's -Winitialized is able to notice that `size` is uninitialized, but in more complex cases it cannot. > I assume the only reason they don't warn right now is that the > compiler doesn't see that they are uninitialized, possibly due to some > earlier pass-by-reference use. That's right, and here is where dynamic analysis comes to the rescue. So should we let KMSAN catch such cases and consider them bugs^W smelly code patterns that need to be fixed? > > Linus -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 20:19 ` Alexander Potapenko @ 2022-06-14 20:43 ` Linus Torvalds 2022-06-14 21:40 ` Segher Boessenkool 1 sibling, 0 replies; 14+ messages in thread From: Linus Torvalds @ 2022-06-14 20:43 UTC (permalink / raw) To: Alexander Potapenko Cc: Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 1:20 PM Alexander Potapenko <glider@google.com> wrote: > > What about the cases where these uninitialized values are never used > in the callee? I assume that what happens is that when things are inlined, the compiler then sees that there is no actual uninitialized value, and that's ok. But if things aren't inlined, I really hope all compilers already warn about "look, I'm calling this function with an uninitialized argument". IOW, compilers can - and should - obviously take more information into account when they can see it. So no, don't warn for things you can actually see are not used. IOW, you shouldn't warn because of any _syntactic_ issue of it being an argument to a function. We often use inlining as an actually semantically meaningful thing, and the compiler should *not* warn for some theoretical "if this was not inlined, the argument would be used and be uninitialized" case. For an example of this kind of "not really used" thing, I could imagine that some configuration might need a "cookie" model to pair up actions, and you have a void *cookie; start(arg, &cookie); .... end(cookie); kind of situation. But then I could imagine that other configurations don't actually need or use that "end()" thing at all, and would leave "cookie" uninitialized, because the only valid use would be an inline function that is empty, and purely there for those *other* configurations. Again, if the compiler inlines 'end()', and sees that 'cookie' is not actually used, then no complaint is needed - or valid. But if 'cookie()' is an actual real function call, and you don't see the use of it, then it had better warn. No? Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 20:19 ` Alexander Potapenko 2022-06-14 20:43 ` Linus Torvalds @ 2022-06-14 21:40 ` Segher Boessenkool 2022-06-14 22:08 ` Evgenii Stepanov 2022-06-15 8:30 ` Alexander Potapenko 1 sibling, 2 replies; 14+ messages in thread From: Segher Boessenkool @ 2022-06-14 21:40 UTC (permalink / raw) To: Alexander Potapenko Cc: Linus Torvalds, Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains Hi! On Tue, Jun 14, 2022 at 10:19:53PM +0200, Alexander Potapenko wrote: > ================ > char *kmalloc(int size); > > char *kmalloc_or_not(int flag, int size, char *p) { > if (flag) > return kmalloc(size); > else > return p; > } > > char global[16]; > > char *p(int flag) { > char *c; > int size; > if (flag) > return kmalloc_or_not(1, 4, c); > else > return kmalloc_or_not(0, size, global); > } > ================ Since C11, lvalue conversion of an automatic variable that does not have its address taken is explicitly undefined behaviour (6.3.2.1/2). So in function "p", both where "c" and where "size" are passed causes UB (so that executing "p" always causes UB btw). > In this example `size` is passed into kmalloc_or_not() initialized, > however it is never used, so the code probably has defined behavior. No such luck: the passing itself already causes UB. GCC does not warn, it has already optimised the code to what you expect by the time this warning is done. If you use -fno-inline it does warn for both "c" and "size" (via -Wmaybe-uninitialized). But it is still UB! All bets are off, no compiler can do any correct translation of your program, since there *is none*. Segher ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 21:40 ` Segher Boessenkool @ 2022-06-14 22:08 ` Evgenii Stepanov 2022-06-15 8:30 ` Alexander Potapenko 1 sibling, 0 replies; 14+ messages in thread From: Evgenii Stepanov @ 2022-06-14 22:08 UTC (permalink / raw) To: Segher Boessenkool Cc: Alexander Potapenko, Linus Torvalds, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 2:45 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > No such luck: the passing itself already causes UB. > > GCC does not warn, it has already optimised the code to what you expect > by the time this warning is done. If you use -fno-inline it does warn > for both "c" and "size" (via -Wmaybe-uninitialized). > > But it is still UB! All bets are off, no compiler can do any correct > translation of your program, since there *is none*. Clang also puts "noundef" attribute on most function arguments in the frontend, and the optimizer can assume that the inputs are fully initialized. The diagnostic for this is very imperfect. What Alex is proposing (the KMSAN option) takes place *after* inlining, so it will only enforce the "semantically meaningful" case in Linus's words. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-14 21:40 ` Segher Boessenkool 2022-06-14 22:08 ` Evgenii Stepanov @ 2022-06-15 8:30 ` Alexander Potapenko 2022-06-15 16:46 ` Segher Boessenkool 1 sibling, 1 reply; 14+ messages in thread From: Alexander Potapenko @ 2022-06-15 8:30 UTC (permalink / raw) To: Segher Boessenkool Cc: Linus Torvalds, Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains On Tue, Jun 14, 2022 at 11:45 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Tue, Jun 14, 2022 at 10:19:53PM +0200, Alexander Potapenko wrote: > > ================ > > char *kmalloc(int size); > > > > char *kmalloc_or_not(int flag, int size, char *p) { > > if (flag) > > return kmalloc(size); > > else > > return p; > > } > > > > char global[16]; > > > > char *p(int flag) { > > char *c; > > int size; > > if (flag) > > return kmalloc_or_not(1, 4, c); > > else > > return kmalloc_or_not(0, size, global); > > } > > ================ > > Since C11, lvalue conversion of an automatic variable that does not have > its address taken is explicitly undefined behaviour (6.3.2.1/2). So in > function "p", both where "c" and where "size" are passed causes UB (so > that executing "p" always causes UB btw). Thanks for this reference to the standard. I've received another one off-list, which lets the variables be address-taken: 6.2.4/6: "If an initialization is specified for the object, it is performed each time the declaration or compound literal is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached." 3.19.2/1: "indeterminate value: either an unspecified value or a trap representation" 6.2.6.1/5: "Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. [Footnote: Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it.] Such a representation is called a trap representation." > > In this example `size` is passed into kmalloc_or_not() initialized, > > however it is never used, so the code probably has defined behavior. > > No such luck: the passing itself already causes UB. Looks like I've been missing this fact all the time. > GCC does not warn, it has already optimised the code to what you expect > by the time this warning is done. If you use -fno-inline it does warn > for both "c" and "size" (via -Wmaybe-uninitialized). > > But it is still UB! All bets are off, no compiler can do any correct > translation of your program, since there *is none*. Then it makes sense for us to report non-trivial cases where uninitialized values are actually passed to functions. As Evgenii mentions, trivial inlinable cases are optimized away before KMSAN instrumentation kicks in, so we won't be reporting them. > > Segher -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [RFC] Initialization of unused function parameters 2022-06-15 8:30 ` Alexander Potapenko @ 2022-06-15 16:46 ` Segher Boessenkool 0 siblings, 0 replies; 14+ messages in thread From: Segher Boessenkool @ 2022-06-15 16:46 UTC (permalink / raw) To: Alexander Potapenko Cc: Linus Torvalds, Evgenii Stepanov, Kees Cook, Marco Elver, Nathan Chancellor, Nick Desaulniers, Thomas Gleixner, Vitaly Buka, Linux Kernel Mailing List, linux-toolchains Hi! On Wed, Jun 15, 2022 at 10:30:17AM +0200, Alexander Potapenko wrote: > On Tue, Jun 14, 2022 at 11:45 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > Since C11, lvalue conversion of an automatic variable that does not have > > its address taken is explicitly undefined behaviour (6.3.2.1/2). So in > > function "p", both where "c" and where "size" are passed causes UB (so > > that executing "p" always causes UB btw). > > Thanks for this reference to the standard. I've received another one > off-list, which lets the variables be address-taken: > > 6.2.4/6: "If an initialization is specified for the object, it is > performed each time the declaration or compound literal is reached in > the execution of the block; otherwise, the value becomes indeterminate > each time the declaration is reached." > 3.19.2/1: "indeterminate value: either an unspecified value or a trap > representation" > 6.2.6.1/5: "Certain object representations need not represent a value > of the object type. If the stored value of an object has such a > representation and is read by an lvalue expression that does not have > character type, the behavior is undefined. If such a representation is > produced by a side effect that modifies all or any part of the object > by an lvalue expression that does not have character type, the > behavior is undefined. [Footnote: Thus, an automatic variable can be > initialized to a trap representation without causing undefined > behavior, but the value of the variable cannot be used until a proper > value is stored in it.] Such a representation is called a trap > representation." That only affects types that have trap representations though, which importantly explicitly does not include unsigned types without padding bits, and unsigned char always. Some people (on the standards committee) think all uninitialised uses should be UB. And some think not. But since C11 we have this new explicit UB for automatic variables that don't have their address taken. (The background of that is the Itanium NaT (not-a-thing) bit in its registers; without this new clause compilers for Itanium needed to initialise many things that they now do not, with some readings of the standard anyway :-) ) > > GCC does not warn, it has already optimised the code to what you expect > > by the time this warning is done. If you use -fno-inline it does warn > > for both "c" and "size" (via -Wmaybe-uninitialized). > > > > But it is still UB! All bets are off, no compiler can do any correct > > translation of your program, since there *is none*. > > Then it makes sense for us to report non-trivial cases where > uninitialized values are actually passed to functions. Yes, and IMO it makes sense to report this to compilers as well, if they do not yet warn for some simple cases. Cheers, Segher ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-06-15 16:54 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-14 14:48 [PATCH] [RFC] Initialization of unused function parameters Alexander Potapenko 2022-06-14 16:48 ` Linus Torvalds 2022-06-14 17:11 ` Nick Desaulniers 2022-06-14 17:24 ` Linus Torvalds 2022-06-14 18:08 ` Nick Desaulniers 2022-06-14 22:27 ` Peter Zijlstra 2022-06-14 18:07 ` Alexander Potapenko 2022-06-14 18:30 ` Linus Torvalds 2022-06-14 20:19 ` Alexander Potapenko 2022-06-14 20:43 ` Linus Torvalds 2022-06-14 21:40 ` Segher Boessenkool 2022-06-14 22:08 ` Evgenii Stepanov 2022-06-15 8:30 ` Alexander Potapenko 2022-06-15 16:46 ` Segher Boessenkool
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.