linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] sched: Work around undefined behavior in sched class checking
       [not found] <20210505033945.1282851-1-ak@linux.intel.com>
@ 2021-05-05  6:46 ` Peter Zijlstra
  2021-05-05  8:47   ` Florian Weimer
  2021-05-05 14:34   ` Andi Kleen
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-05  6:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, Andi Kleen, linux-toolchains

On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
> From: Andi Kleen <andi@firstfloor.org>
> 
> The scheduler initialization code checks that the scheduling
> classes are consecutive in memory by comparing the end
> addresses with the next address.
> 
> Technically in ISO C comparing symbol addresseses outside different objects
> is undefined. With LTO gcc 10 tries to exploits this and creates an
> unconditional BUG_ON in the scheduler initialization, resulting
> in a boot hang.
> 
> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> so the optimizer won't make these assumption. I also split
> the BUG_ONs in multiple.

Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
disable that?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05  6:46 ` [PATCH] sched: Work around undefined behavior in sched class checking Peter Zijlstra
@ 2021-05-05  8:47   ` Florian Weimer
  2021-05-05  9:04     ` Peter Zijlstra
  2021-05-05 14:39     ` Andi Kleen
  2021-05-05 14:34   ` Andi Kleen
  1 sibling, 2 replies; 10+ messages in thread
From: Florian Weimer @ 2021-05-05  8:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, linux-kernel, Andi Kleen, linux-toolchains

* Peter Zijlstra:

> On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
>> From: Andi Kleen <andi@firstfloor.org>
>> 
>> The scheduler initialization code checks that the scheduling
>> classes are consecutive in memory by comparing the end
>> addresses with the next address.
>> 
>> Technically in ISO C comparing symbol addresseses outside different objects
>> is undefined. With LTO gcc 10 tries to exploits this and creates an
>> unconditional BUG_ON in the scheduler initialization, resulting
>> in a boot hang.
>> 
>> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
>> so the optimizer won't make these assumption. I also split
>> the BUG_ONs in multiple.
>
> Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> disable that?

Context:

  <https://lore.kernel.org/lkml/20210505033945.1282851-1-ak@linux.intel.com/>

Obviously, GCC doesn't do this in general.  Would you please provide a
minimal test case?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05  8:47   ` Florian Weimer
@ 2021-05-05  9:04     ` Peter Zijlstra
  2021-05-05 14:39     ` Andi Kleen
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-05  9:04 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Andi Kleen, linux-kernel, Andi Kleen, linux-toolchains

On Wed, May 05, 2021 at 10:47:07AM +0200, Florian Weimer wrote:
> * Peter Zijlstra:
> 
> > On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
> >> From: Andi Kleen <andi@firstfloor.org>
> >> 
> >> The scheduler initialization code checks that the scheduling
> >> classes are consecutive in memory by comparing the end
> >> addresses with the next address.
> >> 
> >> Technically in ISO C comparing symbol addresseses outside different objects
> >> is undefined. With LTO gcc 10 tries to exploits this and creates an
> >> unconditional BUG_ON in the scheduler initialization, resulting
> >> in a boot hang.
> >> 
> >> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> >> so the optimizer won't make these assumption. I also split
> >> the BUG_ONs in multiple.
> >
> > Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> > disable that?
> 
> Context:
> 
>   <https://lore.kernel.org/lkml/20210505033945.1282851-1-ak@linux.intel.com/>
> 
> Obviously, GCC doesn't do this in general.  Would you please provide a
> minimal test case?

Andi has this GCC-LTO patch-set that triggers this, but the thing I'd
like fixed is the UB mentioned above. Not this particular instance.

And, we've had the problem before, see all the RELOC_HIDE crud. Having
this pointer arith outside object be UB is just really annoying. And in
the spirit of UB bad, can we please get a flag to remove the UB and have
it do the obvious, just do the arithmetic and don't do daft things.

Pretty please.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05  6:46 ` [PATCH] sched: Work around undefined behavior in sched class checking Peter Zijlstra
  2021-05-05  8:47   ` Florian Weimer
@ 2021-05-05 14:34   ` Andi Kleen
  2021-05-05 15:34     ` Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2021-05-05 14:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Andi Kleen, linux-toolchains

> > Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> > so the optimizer won't make these assumption. I also split
> > the BUG_ONs in multiple.
> 
> Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> disable that?

Even if that was done (I could totally see the gcc people pushing back on this;
why should they add special flags just for Linux developers not understanding
ISO-C?) you would still need the fix for already shipping compilers.

-Andi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05  8:47   ` Florian Weimer
  2021-05-05  9:04     ` Peter Zijlstra
@ 2021-05-05 14:39     ` Andi Kleen
  2021-05-05 16:41       ` Nick Desaulniers
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2021-05-05 14:39 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Peter Zijlstra, linux-kernel, Andi Kleen, linux-toolchains

> Context:
> 
>   <https://lore.kernel.org/lkml/20210505033945.1282851-1-ak@linux.intel.com/>
> 
> Obviously, GCC doesn't do this in general. 

We've seen it in other cases before, that's why RELOC_HIDE exists.
A classic case was __pa_symbol()

That dates back nearly two decades at this point.

>  Would you please provide a
> minimal test case?

You can only reproduce it with a LTO build because it needs knowledge
between different translation units for this specific case.

But gcc will totally do the optimization even without LTO if it can
prove the same inside a single TU.

If you want to reproduce it you can use my tree here
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc lto-5.12-3
and revert the fix. The kernel will not boot.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05 14:34   ` Andi Kleen
@ 2021-05-05 15:34     ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-05 15:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, Andi Kleen, linux-toolchains

On Wed, May 05, 2021 at 07:34:42AM -0700, Andi Kleen wrote:
> > > Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> > > so the optimizer won't make these assumption. I also split
> > > the BUG_ONs in multiple.
> > 
> > Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> > disable that?
> 
> Even if that was done (I could totally see the gcc people pushing back on this;
> why should they add special flags just for Linux developers not understanding
> ISO-C?)

I understand C fine, I just don't agree with it. I also want to
explicitly define as much UB as is possible, because UB is just utter
garbage.

So just like we do with -fwrapv and others, add more knobs that
explictly define away UB. Less UB is more better. This being C it's
unlikely we'll ever get to no UB, but we should damn well try :-)

> you would still need the fix for already shipping compilers.

Yes, there is that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05 14:39     ` Andi Kleen
@ 2021-05-05 16:41       ` Nick Desaulniers
  2021-05-05 16:48         ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Desaulniers @ 2021-05-05 16:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Florian Weimer, Peter Zijlstra, LKML, Andi Kleen, linux-toolchains

On Wed, May 5, 2021 at 7:39 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> >  Would you please provide a
> > minimal test case?
>
> You can only reproduce it with a LTO build because it needs knowledge
> between different translation units for this specific case.
>
> But gcc will totally do the optimization even without LTO if it can
> prove the same inside a single TU.

It would be helpful to isolate a test case that doesn't rely on LTO,
if possible.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05 16:41       ` Nick Desaulniers
@ 2021-05-05 16:48         ` Andi Kleen
  2021-05-05 17:08           ` Nick Desaulniers
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2021-05-05 16:48 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Florian Weimer, Peter Zijlstra, LKML, Andi Kleen, linux-toolchains


On 5/5/2021 9:41 AM, Nick Desaulniers wrote:
> On Wed, May 5, 2021 at 7:39 AM Andi Kleen <ak@linux.intel.com> wrote:
>>>   Would you please provide a
>>> minimal test case?
>> You can only reproduce it with a LTO build because it needs knowledge
>> between different translation units for this specific case.
>>
>> But gcc will totally do the optimization even without LTO if it can
>> prove the same inside a single TU.
> It would be helpful to isolate a test case that doesn't rely on LTO,
> if possible.

Like I wrote earlier we used to see it all the time in __pa_symbol 
before it used RELOC_HIDE. I bet if you make RELOC_HIDE a nop you'll see 
multiple instances.

But not sure why you want a test case?

-Andi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05 16:48         ` Andi Kleen
@ 2021-05-05 17:08           ` Nick Desaulniers
  2021-05-05 17:20             ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Desaulniers @ 2021-05-05 17:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Florian Weimer, Peter Zijlstra, LKML, Andi Kleen, linux-toolchains

On Wed, May 5, 2021 at 9:49 AM Andi Kleen <ak@linux.intel.com> wrote:
>
>
> On 5/5/2021 9:41 AM, Nick Desaulniers wrote:
> > On Wed, May 5, 2021 at 7:39 AM Andi Kleen <ak@linux.intel.com> wrote:
> >>>   Would you please provide a
> >>> minimal test case?
> >> You can only reproduce it with a LTO build because it needs knowledge
> >> between different translation units for this specific case.
> >>
> >> But gcc will totally do the optimization even without LTO if it can
> >> prove the same inside a single TU.
> > It would be helpful to isolate a test case that doesn't rely on LTO,
> > if possible.
>
> Like I wrote earlier we used to see it all the time in __pa_symbol
> before it used RELOC_HIDE. I bet if you make RELOC_HIDE a nop you'll see
> multiple instances.
>
> But not sure why you want a test case?

In general,
when making a feature request to a compiler vendor, having a
digestible snippet of code that demonstrates the problem goes a long
way, much further than "clone this branch of my fork of this project
and do a build and something goes wrong somewhere."  We're too busy to
do that, please take the time to isolate it before making such
requests.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched: Work around undefined behavior in sched class checking
  2021-05-05 17:08           ` Nick Desaulniers
@ 2021-05-05 17:20             ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2021-05-05 17:20 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Florian Weimer, Peter Zijlstra, LKML, Andi Kleen, linux-toolchains


>> But not sure why you want a test case?
> In general,
> when making a feature request to a compiler vendor, having a
> digestible snippet of code that demonstrates the problem goes a long
> way, much further than "clone this branch of my fork of this project
> and do a build and something goes wrong somewhere."  We're too busy to
> do that, please take the time to isolate it before making such
> requests.

Ah you misunderstood. I'm not making any feature requests.

-Andi


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-05 17:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210505033945.1282851-1-ak@linux.intel.com>
2021-05-05  6:46 ` [PATCH] sched: Work around undefined behavior in sched class checking Peter Zijlstra
2021-05-05  8:47   ` Florian Weimer
2021-05-05  9:04     ` Peter Zijlstra
2021-05-05 14:39     ` Andi Kleen
2021-05-05 16:41       ` Nick Desaulniers
2021-05-05 16:48         ` Andi Kleen
2021-05-05 17:08           ` Nick Desaulniers
2021-05-05 17:20             ` Andi Kleen
2021-05-05 14:34   ` Andi Kleen
2021-05-05 15:34     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).