llvm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [GIT PULL] Block fixes for 6.3-rc3
       [not found] <9d0ef355-f430-e8e2-c844-b34cfcf60d88@kernel.dk>
@ 2023-03-17 18:35 ` Linus Torvalds
  2023-03-17 18:50   ` Linus Torvalds
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Linus Torvalds @ 2023-03-17 18:35 UTC (permalink / raw)
  To: Jens Axboe, Nathan Chancellor, Nick Desaulniers, Kees Cook
  Cc: linux-block, clang-built-linux

On Fri, Mar 17, 2023 at 10:16 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> - blk-mq SRCU fix for BLK_MQ_F_BLOCKING devices (Christ)

Christ indeed.

But I think you meant "Chris".

> Side note - when doing the usual allmodconfig builds with gcc-12 and
> clang before sending them out, for the latter I see this warning being
> spewed with clang-15:
>
> drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
>
> Obviously not related to my changes, but mentioning it in case it has
> been missed as I know you love squeaky clean builds :-). Doesn't happen
> with clang-14.

Hmm. I have clang-15 too, but I do the allmodconfig builds with gcc,
and only my own "normal config" builds with clang.

So I don't see this particular issue and my builds are still squeaky clean.

That said, when I explicitly try that allmodconfig thing with clang, I
can see it too. And the reason seems to be something we've seen
before: UBSAN functions being considered non-return by clang, so clang
generates code like this:

   ....
.LBB24_3:
        callq   __sanitizer_cov_trace_pc@PLT
        movl    $2, %esi
        movq    $.L__unnamed_3, %rdi
        callq   __ubsan_handle_out_of_bounds
.Lfunc_end24:
        .size   m5mols_set_fmt, .Lfunc_end24-m5mols_set_fmt

ie the last thing in that m5mols_set_fmt() function is a call to
__ubsan_handle_out_of_bounds, and then it "falls through" to the next
function.

And yes, I absolutely *detest* how clang does that. Not only does it
cause objtool sanity checking issues, it fundamentally means that we
can never treat UBSAN warnings as warnings. They are always fatal.

This is a *huge* clang mis-feature, and I forget what we decided last
that we saw it.

But I suspect we need to disable UBSAN for clang, because clang gets
this so *horribly* wrong.

Fatal errors that cannot be recovered from are not something that the
compiler is supposed to decide on. It's exactly the same issue as
BUG() calls: it just results in a dead machine, and in the process the
actual problem easily gets lost (because maybe this only happens while
running X, and no serial console, and no way to actually see what the
UBSAN warning was as a result).

I really really detest this thing, and I think this is a fatal flaw,
and means that as-is, UBSAN really *has* to be disabled for clang
kernel builds. Maybe that will make somebody wake up and smell the
roses, and stop this idiotic "undefined behavior is fatal" garbage.

Nick? Do you remember what the fix was last time?

               Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 18:35 ` [GIT PULL] Block fixes for 6.3-rc3 Linus Torvalds
@ 2023-03-17 18:50   ` Linus Torvalds
  2023-03-17 20:31     ` Miguel Ojeda
  2023-03-17 18:59   ` Nick Desaulniers
  2023-03-17 19:44   ` Jens Axboe
  2 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2023-03-17 18:50 UTC (permalink / raw)
  To: Jens Axboe, Nathan Chancellor, Nick Desaulniers, Kees Cook
  Cc: linux-block, clang-built-linux

On Fri, Mar 17, 2023 at 11:35 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I really really detest this thing, and I think this is a fatal flaw,
> and means that as-is, UBSAN really *has* to be disabled for clang
> kernel builds. Maybe that will make somebody wake up and smell the
> roses, and stop this idiotic "undefined behavior is fatal" garbage.

Btw, this is not at all kernel-centric, even if the kernel may be
special in not necessarily being able to log things on fatal issues.

Imagine you are a database vendor, and you cannot replicate something
that a user sees, and the user is not willing to give you their
database, and you worry that it might be some undefined thing.

Are you going to ship the user a test-build that might die in the
middle, and thus be unable to actually show the behavior in any
half-way production setting? Would any sane user run that pile of
garbage on their machines, knowing that any error would be fatal and
result in their production database being broken?

So any compiler person that says "undefined behavior can do anything
at all according to the standard, so we're compliant with that" is an
incompetent person, and shouldn't be let near a real compiler.

Debug support *MUST* have an option to just continue. Sure, make "this
is fatal" be an option *too*, because if you are a developer, maybe
you really want the "fail hard, fail quickly" behavior.

But that fatal option cannot be the *only* choice, or we cannot use
said debug code in the kernel.

             Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 18:35 ` [GIT PULL] Block fixes for 6.3-rc3 Linus Torvalds
  2023-03-17 18:50   ` Linus Torvalds
@ 2023-03-17 18:59   ` Nick Desaulniers
  2023-03-17 19:50     ` Kees Cook
  2023-03-17 19:44   ` Jens Axboe
  2 siblings, 1 reply; 16+ messages in thread
From: Nick Desaulniers @ 2023-03-17 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Nathan Chancellor, Kees Cook, linux-block, clang-built-linux

+Sanitizer folks (BCC'd)
(Top of lore thread:
https://lore.kernel.org/linux-block/9d0ef355-f430-e8e2-c844-b34cfcf60d88@kernel.dk/)

On Fri, Mar 17, 2023 at 11:35 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Mar 17, 2023 at 10:16 AM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > - blk-mq SRCU fix for BLK_MQ_F_BLOCKING devices (Christ)
>
> Christ indeed.
>
> But I think you meant "Chris".
>
> > Side note - when doing the usual allmodconfig builds with gcc-12 and
> > clang before sending them out, for the latter I see this warning being
> > spewed with clang-15:
> >
> > drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
> >
> > Obviously not related to my changes, but mentioning it in case it has
> > been missed as I know you love squeaky clean builds :-). Doesn't happen
> > with clang-14.
>
> Hmm. I have clang-15 too, but I do the allmodconfig builds with gcc,
> and only my own "normal config" builds with clang.
>
> So I don't see this particular issue and my builds are still squeaky clean.
>
> That said, when I explicitly try that allmodconfig thing with clang, I
> can see it too. And the reason seems to be something we've seen
> before: UBSAN functions being considered non-return by clang, so clang
> generates code like this:
>
>    ....
> .LBB24_3:
>         callq   __sanitizer_cov_trace_pc@PLT
>         movl    $2, %esi
>         movq    $.L__unnamed_3, %rdi
>         callq   __ubsan_handle_out_of_bounds
> .Lfunc_end24:
>         .size   m5mols_set_fmt, .Lfunc_end24-m5mols_set_fmt
>
> ie the last thing in that m5mols_set_fmt() function is a call to
> __ubsan_handle_out_of_bounds, and then it "falls through" to the next
> function.
>
> And yes, I absolutely *detest* how clang does that. Not only does it
> cause objtool sanity checking issues, it fundamentally means that we
> can never treat UBSAN warnings as warnings. They are always fatal.
>
> This is a *huge* clang mis-feature, and I forget what we decided last
> that we saw it.
>
> But I suspect we need to disable UBSAN for clang, because clang gets
> this so *horribly* wrong.
>
> Fatal errors that cannot be recovered from are not something that the
> compiler is supposed to decide on. It's exactly the same issue as
> BUG() calls: it just results in a dead machine, and in the process the
> actual problem easily gets lost (because maybe this only happens while
> running X, and no serial console, and no way to actually see what the
> UBSAN warning was as a result).
>
> I really really detest this thing, and I think this is a fatal flaw,
> and means that as-is, UBSAN really *has* to be disabled for clang
> kernel builds. Maybe that will make somebody wake up and smell the
> roses, and stop this idiotic "undefined behavior is fatal" garbage.
>
> Nick? Do you remember what the fix was last time?
>
>                Linus



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 18:35 ` [GIT PULL] Block fixes for 6.3-rc3 Linus Torvalds
  2023-03-17 18:50   ` Linus Torvalds
  2023-03-17 18:59   ` Nick Desaulniers
@ 2023-03-17 19:44   ` Jens Axboe
  2 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-03-17 19:44 UTC (permalink / raw)
  To: Linus Torvalds, Nathan Chancellor, Nick Desaulniers, Kees Cook
  Cc: linux-block, clang-built-linux

On 3/17/23 12:35?PM, Linus Torvalds wrote:
> On Fri, Mar 17, 2023 at 10:16?AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> - blk-mq SRCU fix for BLK_MQ_F_BLOCKING devices (Christ)
> 
> Christ indeed.
> 
> But I think you meant "Chris".

Oops yes indeed, good catch. Though I'd love to think that if we did
have a resurrection of that nature, blk-mq would be high on the list of
things to hack on.

>> Side note - when doing the usual allmodconfig builds with gcc-12 and
>> clang before sending them out, for the latter I see this warning being
>> spewed with clang-15:
>>
>> drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
>>
>> Obviously not related to my changes, but mentioning it in case it has
>> been missed as I know you love squeaky clean builds :-). Doesn't happen
>> with clang-14.
> 
> Hmm. I have clang-15 too, but I do the allmodconfig builds with gcc,
> and only my own "normal config" builds with clang.

I just do both to avoid anything odd, it's just a few min each.

> So I don't see this particular issue and my builds are still squeaky clean.
> 
> That said, when I explicitly try that allmodconfig thing with clang, I
> can see it too. And the reason seems to be something we've seen
> before: UBSAN functions being considered non-return by clang, so clang
> generates code like this:
> 
>    ....
> .LBB24_3:
>         callq   __sanitizer_cov_trace_pc@PLT
>         movl    $2, %esi
>         movq    $.L__unnamed_3, %rdi
>         callq   __ubsan_handle_out_of_bounds
> .Lfunc_end24:
>         .size   m5mols_set_fmt, .Lfunc_end24-m5mols_set_fmt
> 
> ie the last thing in that m5mols_set_fmt() function is a call to
> __ubsan_handle_out_of_bounds, and then it "falls through" to the next
> function.
> 
> And yes, I absolutely *detest* how clang does that. Not only does it
> cause objtool sanity checking issues, it fundamentally means that we
> can never treat UBSAN warnings as warnings. They are always fatal.
> 
> This is a *huge* clang mis-feature, and I forget what we decided last
> that we saw it.

Gotcha, thanks for digging into that. I must admit I didn't look any
further at it, but figured it was worth reporting...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 18:59   ` Nick Desaulniers
@ 2023-03-17 19:50     ` Kees Cook
  2023-03-17 20:30       ` Linus Torvalds
  2023-03-17 20:35       ` Nick Desaulniers
  0 siblings, 2 replies; 16+ messages in thread
From: Kees Cook @ 2023-03-17 19:50 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Linus Torvalds, Jens Axboe, Nathan Chancellor, linux-block,
	clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 11:59:11AM -0700, Nick Desaulniers wrote:
> +Sanitizer folks (BCC'd)
> (Top of lore thread:
> https://lore.kernel.org/linux-block/9d0ef355-f430-e8e2-c844-b34cfcf60d88@kernel.dk/)
> 
> On Fri, Mar 17, 2023 at 11:35 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Fri, Mar 17, 2023 at 10:16 AM Jens Axboe <axboe@kernel.dk> wrote:
> > > Side note - when doing the usual allmodconfig builds with gcc-12 and
> > > clang before sending them out, for the latter I see this warning being
> > > spewed with clang-15:
> > >
> > > drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
> > >
> > > Obviously not related to my changes, but mentioning it in case it has
> > > been missed as I know you love squeaky clean builds :-). Doesn't happen
> > > with clang-14.
> >
> > Hmm. I have clang-15 too, but I do the allmodconfig builds with gcc,
> > and only my own "normal config" builds with clang.
> >
> > So I don't see this particular issue and my builds are still squeaky clean.
> >
> > That said, when I explicitly try that allmodconfig thing with clang, I
> > can see it too. And the reason seems to be something we've seen
> > before: UBSAN functions being considered non-return by clang, so clang
> > generates code like this:
> >
> >    ....
> > .LBB24_3:
> >         callq   __sanitizer_cov_trace_pc@PLT
> >         movl    $2, %esi
> >         movq    $.L__unnamed_3, %rdi
> >         callq   __ubsan_handle_out_of_bounds
> > .Lfunc_end24:
> >         .size   m5mols_set_fmt, .Lfunc_end24-m5mols_set_fmt
> >
> > ie the last thing in that m5mols_set_fmt() function is a call to
> > __ubsan_handle_out_of_bounds, and then it "falls through" to the next
> > function.
> >
> > And yes, I absolutely *detest* how clang does that. Not only does it
> > cause objtool sanity checking issues, it fundamentally means that we
> > can never treat UBSAN warnings as warnings. They are always fatal.

I've hit these cases a few times too. The __ubsan_handle_* stuff
is designed to be recoverable. I think there are some cases where
we're tripping over Clang bugs, though. Some of the past issues
have been with Clang thinking some UBSAN feature was trap-only
(e.g. -fsanitizer=local-bounds), but here it actually generated the call,
but decided it was no-return. *sigh*

> > But I suspect we need to disable UBSAN for clang, because clang gets
> > this so *horribly* wrong.

Which is to say, it normally gets it right, but there are some instances
where things go weird. If it was horribly wrong, there would be a LOT
more objtool warnings. :)

I'm not opposed to disabling UBSAN for all*config builds if we need to,
but I want to get these Clang bugs found and fixed so I'd be sad to lose
the coverage.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 19:50     ` Kees Cook
@ 2023-03-17 20:30       ` Linus Torvalds
  2023-03-17 20:42         ` Miguel Ojeda
  2023-03-17 20:35       ` Nick Desaulniers
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2023-03-17 20:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Nick Desaulniers, Jens Axboe, Nathan Chancellor, linux-block,
	clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 12:50 PM Kees Cook <keescook@chromium.org> wrote:
>
> I've hit these cases a few times too. The __ubsan_handle_* stuff
> is designed to be recoverable. I think there are some cases where
> we're tripping over Clang bugs, though. Some of the past issues
> have been with Clang thinking some UBSAN feature was trap-only
> (e.g. -fsanitizer=local-bounds), but here it actually generated the call,
> but decided it was no-return. *sigh*

Hmm. The problem here is that we only get an objdump warning by pure luck.

This isn't a "objdump will always catch it", it's more like "objdump
will only catch it if the code happens to be moved to the end of the
function".

So I'm very happy to hear that it's not by design, and that's very good.

But the whole "this is a compiler bug" doesn't exactly give me the
warm and fuzzies either.

The code generation looks *really* odd to me. This is, as far as I can
tell, the code sequence that this code has:

m5mols_set_fmt:
  ..
        movq    %rdx, %rbp
  ..
        movl    16(%rbp), %ebx
        movq    %rbx, %rdi
  ..
        cmpq    $16385, %rbx                    # imm = 0x4001
        jne     .LBB24_1
  ..
.LBB24_1:
        cmpl    $8199, %ebx                     # imm = 0x2007
        jne     .LBB24_3
  ..
.LBB24_3:
        callq   __sanitizer_cov_trace_pc@PLT
        movl    $2, %esi
        movq    $.L__unnamed_3, %rdi
        callq   __ubsan_handle_out_of_bounds

and as far as I can tell, that "movl 16(%rbp), %ebx" is

        enum m5mols_restype stype = __find_restype(mf->code);

in __find_resolution(). But I don't understand those odd constants
it's comparing against at all.

'enum m5mols_restype' should be in the range 0..2, but it seems to use
a 64-bit compare against '16385' (and then a 32-bit compare against
'8199') to decide it's out-of-bounds.

I must be *completely* mis-reading this, because none of that makes a
whit of sense to me.

It's possible that clang is just *so* terminally confused that it just
generates random code, but it's more likely that I'm mis-reading
things.

The above is my interpretation of what I see with the current F37
clang version for a "allmodconfig" build of that file.

My 'clang -v' says

    clang version 15.0.7 (Fedora 15.0.7-1.fc37)

in case some clang person wants to try to recreate this.

> I'm not opposed to disabling UBSAN for all*config builds if we need to,
> but I want to get these Clang bugs found and fixed so I'd be sad to lose
> the coverage.

I wonder how many people actually end up having UBSAN actually
enabled. I get the feeling that most of the coverage it gets is
exactly just the compile-only coverage by "allmodconfig" and friends.

Otherwise it would be really easy to just say

        depends on !COMPILE_TEST

for all these run-time debug things. But exactly *because* they have
caused endless amounts of build-time pain - *and* because I doubt how
much real run-time testing they get - limiting it to non-build tests
sounds a bit counter-productive.

              Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 18:50   ` Linus Torvalds
@ 2023-03-17 20:31     ` Miguel Ojeda
  2023-03-17 20:45       ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Miguel Ojeda @ 2023-03-17 20:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Nathan Chancellor, Nick Desaulniers, Kees Cook,
	linux-block, clang-built-linux

On Fri, Mar 17, 2023 at 7:50 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Debug support *MUST* have an option to just continue. Sure, make "this
> is fatal" be an option *too*, because if you are a developer, maybe
> you really want the "fail hard, fail quickly" behavior.

At least for userspace it has different modes -- the default is to
report and continue:

    -fsanitize=...: print a verbose error report and continue
execution (default);
    -fno-sanitize-recover=...: print a verbose error report and exit
the program;
    -fsanitize-trap=...: execute a trap instruction (doesn’t require
UBSan run-time support).
    -fno-sanitize=...: disable any check, e.g., -fno-sanitize=alignment.

    (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#usage)

For the kernel I assume it is meant to work due to `UBSAN_TRAP`, but
the optimizer may be getting in the way.

From a quick look, it seems that `__find_restype` (which gets inlined
into `m5mols_set_fmt` via `__find_resolution`) has a small loop:

    do {
        if (code == m5mols_default_ffmt[type].code)
            return type;
    } while (type++ != SIZE_DEFAULT_FFMT);

that I guess gets simplified into something like:

    if (code == first code)
        yield 0;
    if (code == second code)
        yield 1;
    out of bounds;

When UBSAN is disabled, it may be assuming:

    if (code == first code)
        yield 0;
    yield 1;

Thus no issue.

Though even with UBSAN disabled, if I add a dummy opaque call inside
the loop, it goes back to something like before, except the jump goes
nowhere and then `objtool` complains again:

    .LBB24_20:
        callq    dummydummydummy
    .Lfunc_end24:

So it is reproducible even without UBSAN getting involved:
https://godbolt.org/z/hTe91b3eG

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 19:50     ` Kees Cook
  2023-03-17 20:30       ` Linus Torvalds
@ 2023-03-17 20:35       ` Nick Desaulniers
  1 sibling, 0 replies; 16+ messages in thread
From: Nick Desaulniers @ 2023-03-17 20:35 UTC (permalink / raw)
  To: Kees Cook, Linus Torvalds
  Cc: Jens Axboe, Nathan Chancellor, linux-block, clang-built-linux,
	linux-hardening

On Fri, Mar 17, 2023 at 12:50 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Mar 17, 2023 at 11:59:11AM -0700, Nick Desaulniers wrote:
> > +Sanitizer folks (BCC'd)
> > (Top of lore thread:
> > https://lore.kernel.org/linux-block/9d0ef355-f430-e8e2-c844-b34cfcf60d88@kernel.dk/)
> >
> > On Fri, Mar 17, 2023 at 11:35 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Fri, Mar 17, 2023 at 10:16 AM Jens Axboe <axboe@kernel.dk> wrote:
> > > > Side note - when doing the usual allmodconfig builds with gcc-12 and
> > > > clang before sending them out, for the latter I see this warning being
> > > > spewed with clang-15:
> > > >
> > > > drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
> > > >
> > > > Obviously not related to my changes, but mentioning it in case it has
> > > > been missed as I know you love squeaky clean builds :-). Doesn't happen
> > > > with clang-14.
> > >
> > > Hmm. I have clang-15 too, but I do the allmodconfig builds with gcc,
> > > and only my own "normal config" builds with clang.
> > >
> > > So I don't see this particular issue and my builds are still squeaky clean.
> > >
> > > That said, when I explicitly try that allmodconfig thing with clang, I
> > > can see it too. And the reason seems to be something we've seen
> > > before: UBSAN functions being considered non-return by clang, so clang
> > > generates code like this:
> > >
> > >    ....
> > > .LBB24_3:
> > >         callq   __sanitizer_cov_trace_pc@PLT
> > >         movl    $2, %esi
> > >         movq    $.L__unnamed_3, %rdi
> > >         callq   __ubsan_handle_out_of_bounds
> > > .Lfunc_end24:
> > >         .size   m5mols_set_fmt, .Lfunc_end24-m5mols_set_fmt
> > >
> > > ie the last thing in that m5mols_set_fmt() function is a call to
> > > __ubsan_handle_out_of_bounds, and then it "falls through" to the next
> > > function.
> > >
> > > And yes, I absolutely *detest* how clang does that. Not only does it
> > > cause objtool sanity checking issues, it fundamentally means that we
> > > can never treat UBSAN warnings as warnings. They are always fatal.
>
> I've hit these cases a few times too. The __ubsan_handle_* stuff
> is designed to be recoverable. I think there are some cases where
> we're tripping over Clang bugs, though. Some of the past issues
> have been with Clang thinking some UBSAN feature was trap-only
> (e.g. -fsanitizer=local-bounds), but here it actually generated the call,
> but decided it was no-return. *sigh*

I think no-return is a red herring (or rather, I don't think noreturn
is at play here at all). Looking at the IR, I don't see anything that
indicated anything was deduced to be noreturn.

It looks like this is coming from the loop in __find_restype() being
fully unrolled.

On the initial iteration, `type` == `M5MOLS_RESTYPE_MONITOR` == 0.
`m5mols_default_ffmt` is a 2 element array.
If we don't match we loop again, `type` == `M5MOLS_RESTYPE_CAPTURE` == 1.
`SIZE_DEFAULT_FFMT` == ARRAY_SIZE(m5mols_default_ffmt) == 2, so we loop again.
`type` == 2, accessing m5mols_default_ffmt out of bounds.

Perhaps this code meant to simply use a for loop rather than do-while?
(Or pre-increment rather than post increment for the do-while)?

```
diff --git a/drivers/media/i2c/m5mols/m5mols_core.c
b/drivers/media/i2c/m5mols/m5mols_core.c
index 2b01873ba0db..603b1036127e 100644
--- a/drivers/media/i2c/m5mols/m5mols_core.c
+++ b/drivers/media/i2c/m5mols/m5mols_core.c
@@ -485,10 +485,9 @@ static enum m5mols_restype __find_restype(u32 code)
 {
        enum m5mols_restype type = M5MOLS_RESTYPE_MONITOR;

-       do {
+       for (; type != M5MOLS_RESTYPE_MAX; ++type)
                if (code == m5mols_default_ffmt[type].code)
                        return type;
-       } while (type++ != SIZE_DEFAULT_FFMT);

        return 0;
 }
```

>
> > > But I suspect we need to disable UBSAN for clang, because clang gets
> > > this so *horribly* wrong.

I think Linus is thinking about,
commit e5d523f1ae8f ("ubsan: disable UBSAN_DIV_ZERO for clang")

I can see the loop unroller inserting a branch on "poison" which is a
magic value in LLVM related to but distinct from "undef".  That gets
replaced with an "unreachable" instruction.
I wonder if we can change loop unroller to not insert branch on poison
when the sanitizers are enabled, or freeze poison.
https://llvm.org/devmtg/2020-09/slides/Lee-UndefPoison.pdf

Maybe Linus has thoughts he can share on this thread:
https://github.com/llvm/llvm-project/issues/56289?

Finally, there's always `-mllvm -trap-unreachable` though that's a
last resort kind of thing; `-mllvm` flags need to be passed to the
linker for LTO, and compiler for non-LTO and LTO.  I do think in this
case that the fallthough is bringing to our attention an issue in the
source.

>
> Which is to say, it normally gets it right, but there are some instances
> where things go weird. If it was horribly wrong, there would be a LOT
> more objtool warnings. :)
>
> I'm not opposed to disabling UBSAN for all*config builds if we need to,
> but I want to get these Clang bugs found and fixed so I'd be sad to lose
> the coverage.
>
> --
> Kees Cook



--
Thanks,
~Nick Desaulniers

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 20:30       ` Linus Torvalds
@ 2023-03-17 20:42         ` Miguel Ojeda
  2023-03-17 20:51           ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Miguel Ojeda @ 2023-03-17 20:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Nick Desaulniers, Jens Axboe, Nathan Chancellor,
	linux-block, clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 9:31 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> 'enum m5mols_restype' should be in the range 0..2, but it seems to use
> a 64-bit compare against '16385' (and then a 32-bit compare against
> '8199') to decide it's out-of-bounds.

It is comparing against just the `.code` in the `m5mols_default_ffmt`
table, i.e. the `MEDIA_BUS_FMT_VYUY8_2X8` (8199 = 0x2007) and
`MEDIA_BUS_FMT_JPEG_1X8` (16385 = 0x4001), see
https://elixir.bootlin.com/linux/latest/source/drivers/media/i2c/m5mols/m5mols_core.c#L52.

So it is basically:

    if (code == m5mols_default_ffmt[0].code)
        return type;
    if (type == 2)
        continue;
    ++type;

    if (code == m5mols_default_ffmt[1].code)
        return type;
    if (type == 2)
        continue;
    ++type;

    m5mols_default_ffmt[2].code -> out of bounds UB -> unreachable

If the condition had `++type` instead, it would not be a problem,
because the loop stops before we go into the out of bounds access thus
no UB.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 20:31     ` Miguel Ojeda
@ 2023-03-17 20:45       ` Linus Torvalds
  0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2023-03-17 20:45 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Jens Axboe, Nathan Chancellor, Nick Desaulniers, Kees Cook,
	linux-block, clang-built-linux

On Fri, Mar 17, 2023 at 1:31 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> From a quick look, it seems that `__find_restype` (which gets inlined
> into `m5mols_set_fmt` via `__find_resolution`) has a small loop:
>
>     do {
>         if (code == m5mols_default_ffmt[type].code)
>             return type;
>     } while (type++ != SIZE_DEFAULT_FFMT);
>
> that I guess gets simplified into something like:
>
>     if (code == first code)
>         yield 0;
>     if (code == second code)
>         yield 1;
>     out of bounds;

Ahh. That explains the odd constants I saw. I did figure out it had
something to do with loading 'mf->code', but then it went off the
rails.

> Though even with UBSAN disabled, if I add a dummy opaque call inside
> the loop, it goes back to something like before, except the jump goes
> nowhere and then `objtool` complains again:
>
>     .LBB24_20:
>         callq    dummydummydummy
>     .Lfunc_end24:
>
> So it is reproducible even without UBSAN getting involved:
> https://godbolt.org/z/hTe91b3eG

That code generation is some crazy stuff.

Yeah, that's not acceptable, and the bug is clearly not UBSAN, but
some generic bogus clang thing.

Much nicer to try to debug this as a "objtool complains about the
generated code" rather than as some insane runtine problem with code
falling off the end of the function.

                Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 20:42         ` Miguel Ojeda
@ 2023-03-17 20:51           ` Linus Torvalds
  2023-03-17 21:00             ` Linus Torvalds
                               ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Linus Torvalds @ 2023-03-17 20:51 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Kees Cook, Nick Desaulniers, Jens Axboe, Nathan Chancellor,
	linux-block, clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 1:42 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> It is comparing against just the `.code` in the `m5mols_default_ffmt`
> table, i.e. the `MEDIA_BUS_FMT_VYUY8_2X8` (8199 = 0x2007) and
> `MEDIA_BUS_FMT_JPEG_1X8` (16385 = 0x4001), see

Yeah, I see what it's doing.

But:

> If the condition had `++type` instead, it would not be a problem,
> because the loop stops before we go into the out of bounds access thus
> no UB.

Yeah, but clang really should have generated a proper third iteration,
which calls that "out of bounds" case, and then returns, instead fo
falling off the end.

I do think that on the kernel side, the fix is to just change

        } while (type++ != SIZE_DEFAULT_FFMT);

to

        } while (++type != SIZE_DEFAULT_FFMT);

but I would *really* like clang to be fixed to not silently generate
code that does insane things and would be basically impossible to
debug if it ever triggers.

We would have spent a *lot* of time wondering how the heck we Oopsed
in m5mols_get_frame_desc().

             Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 20:51           ` Linus Torvalds
@ 2023-03-17 21:00             ` Linus Torvalds
  2023-03-17 22:56               ` Nick Desaulniers
  2023-03-17 21:01             ` Miguel Ojeda
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2023-03-17 21:00 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Kees Cook, Nick Desaulniers, Jens Axboe, Nathan Chancellor,
	linux-block, clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 1:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> but I would *really* like clang to be fixed to not silently generate
> code that does insane things and would be basically impossible to
> debug if it ever triggers.

Side note: the key word here is "silently".

If clang notices that it generates crazy code, a warning at build-time
would be preferable to the "oh, we noticed the crazy code generation
because we do sanity checking that just happened to catch it".

If that "fall-through due to impossible third iteration" had happened
*inside* the function, rather than at the end, we'd have never
noticed.

So we were kind of lucky in just where things triggered this time.
Maybe that's always going to be true due to some random internal clang
logic ("dead code fallthrough is always at the end because of how we
sort basic blocks"), but I don't see why it would be in general.

                      Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 20:51           ` Linus Torvalds
  2023-03-17 21:00             ` Linus Torvalds
@ 2023-03-17 21:01             ` Miguel Ojeda
  2023-03-18 18:20             ` Linus Torvalds
  2023-03-19  0:48             ` Mauro Carvalho Chehab
  3 siblings, 0 replies; 16+ messages in thread
From: Miguel Ojeda @ 2023-03-17 21:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Nick Desaulniers, Jens Axboe, Nathan Chancellor,
	linux-block, clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 9:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yeah, I see what it's doing.

Yeah, sorry, I saw your later email after sending that one.

> Yeah, but clang really should have generated a proper third iteration,
> which calls that "out of bounds" case, and then returns, instead fo
> falling off the end.
>
> I do think that on the kernel side, the fix is to just change
>
>         } while (type++ != SIZE_DEFAULT_FFMT);
>
> to
>
>         } while (++type != SIZE_DEFAULT_FFMT);
>
> but I would *really* like clang to be fixed to not silently generate
> code that does insane things and would be basically impossible to
> debug if it ever triggers.

Not sure how easy is for them to realize that they should do a 3rd
iteration. But perhaps it would be possible that Clang/LLVM does a
similar check to objtool and at least emit a warning about similar
situations that would help developers diagnose this (since it should
have way more information about what happened than objtool).

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 21:00             ` Linus Torvalds
@ 2023-03-17 22:56               ` Nick Desaulniers
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Desaulniers @ 2023-03-17 22:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miguel Ojeda, Kees Cook, Jens Axboe, Nathan Chancellor,
	linux-block, clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 2:00 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Mar 17, 2023 at 1:51 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > but I would *really* like clang to be fixed to not silently generate
> > code that does insane things and would be basically impossible to
> > debug if it ever triggers.
>
> Side note: the key word here is "silently".
>
> If clang notices that it generates crazy code, a warning at build-time
> would be preferable to the "oh, we noticed the crazy code generation
> because we do sanity checking that just happened to catch it".

That's fair.  I have something hacked up locally that can spot the
fallthough from m5mols_set_fmt() as objtool did. With some polish, we
can likely ship that as a compiler warning.  Then we can have these
checks regardless of objtool arch support.

First I need to teach LLVM that __stack_chk_fail is noreturn, though
I've only verified that thus far in glibc, musl, and bionic; I still
need to check that's the case for the BSDs' libcs.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 20:51           ` Linus Torvalds
  2023-03-17 21:00             ` Linus Torvalds
  2023-03-17 21:01             ` Miguel Ojeda
@ 2023-03-18 18:20             ` Linus Torvalds
  2023-03-19  0:48             ` Mauro Carvalho Chehab
  3 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2023-03-18 18:20 UTC (permalink / raw)
  To: Miguel Ojeda, HeungJun, Kim, Sylwester Nawrocki, Kyungmin Park,
	Mauro Carvalho Chehab
  Cc: Kees Cook, Nick Desaulniers, Jens Axboe, Nathan Chancellor,
	linux-block, clang-built-linux, linux-hardening

On Fri, Mar 17, 2023 at 1:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I do think that on the kernel side, the fix is to just change
>
>         } while (type++ != SIZE_DEFAULT_FFMT);
>
> to
>
>         } while (++type != SIZE_DEFAULT_FFMT);

Ok, I ended up deciding to just commit that minimal change, even
though it might have been better to just make it a normal for-loop
(and use M5MOLS_RESTYPE_MAX instead as the end condition).

So maybe it would be more legible (and less likely to have had that
off-by-one) if the loop had been

        for (type = 0; type < M5MOLS_RESTYPE_MAX; type++)

instead. But I'll leave that decision to the driver authors (now cc'd).

For people brought in late, this is now commit efbcbb12ee99 ("media:
m5mols: fix off-by-one loop termination error") with link to the
discussion here

    https://lore.kernel.org/linux-block/CAHk-=wgTSdKYbmB1JYM5vmHMcD9J9UZr0mn7BOYM_LudrP+Xvw@mail.gmail.com/

so you can see the history of it (me having initially blamed UBSAN,
but the problem can be triggered at least in theory without it).

                  Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] Block fixes for 6.3-rc3
  2023-03-17 20:51           ` Linus Torvalds
                               ` (2 preceding siblings ...)
  2023-03-18 18:20             ` Linus Torvalds
@ 2023-03-19  0:48             ` Mauro Carvalho Chehab
  3 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2023-03-19  0:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miguel Ojeda, Kees Cook, Nick Desaulniers, Jens Axboe,
	Nathan Chancellor, linux-block, clang-built-linux,
	linux-hardening, Sylwester Nawrocki, Marek Szyprowski,
	Sakari Ailus, linux-media

Em Fri, 17 Mar 2023 13:51:17 -0700
Linus Torvalds <torvalds@linux-foundation.org> escreveu:

> On Fri, Mar 17, 2023 at 1:42 PM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> >
> > It is comparing against just the `.code` in the `m5mols_default_ffmt`
> > table, i.e. the `MEDIA_BUS_FMT_VYUY8_2X8` (8199 = 0x2007) and
> > `MEDIA_BUS_FMT_JPEG_1X8` (16385 = 0x4001), see  
> 
> Yeah, I see what it's doing.
> 
> But:
> 
> > If the condition had `++type` instead, it would not be a problem,
> > because the loop stops before we go into the out of bounds access thus
> > no UB.  
> 
> Yeah, but clang really should have generated a proper third iteration,
> which calls that "out of bounds" case, and then returns, instead fo
> falling off the end.
> 
> I do think that on the kernel side, the fix is to just change
> 
>         } while (type++ != SIZE_DEFAULT_FFMT);
> 
> to
> 
>         } while (++type != SIZE_DEFAULT_FFMT);

Yeah, that seems to be the right fix to me too.

Ack on such change: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=efbcbb12ee99f750c9f25c873b55ad774871de2a

Regards,
Mauro


> 
> but I would *really* like clang to be fixed to not silently generate
> code that does insane things and would be basically impossible to
> debug if it ever triggers.
> 
> We would have spent a *lot* of time wondering how the heck we Oopsed
> in m5mols_get_frame_desc().
> 
>              Linus
> 



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-03-19  0:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <9d0ef355-f430-e8e2-c844-b34cfcf60d88@kernel.dk>
2023-03-17 18:35 ` [GIT PULL] Block fixes for 6.3-rc3 Linus Torvalds
2023-03-17 18:50   ` Linus Torvalds
2023-03-17 20:31     ` Miguel Ojeda
2023-03-17 20:45       ` Linus Torvalds
2023-03-17 18:59   ` Nick Desaulniers
2023-03-17 19:50     ` Kees Cook
2023-03-17 20:30       ` Linus Torvalds
2023-03-17 20:42         ` Miguel Ojeda
2023-03-17 20:51           ` Linus Torvalds
2023-03-17 21:00             ` Linus Torvalds
2023-03-17 22:56               ` Nick Desaulniers
2023-03-17 21:01             ` Miguel Ojeda
2023-03-18 18:20             ` Linus Torvalds
2023-03-19  0:48             ` Mauro Carvalho Chehab
2023-03-17 20:35       ` Nick Desaulniers
2023-03-17 19:44   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).