All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.17.0-rc1 doesn't boot.
@ 2018-04-17  8:00 Jörg Otte
  2018-04-17  8:14 ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Jörg Otte @ 2018-04-17  8:00 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List

Hi,
my notebook doesn't boot with 4.17.0-rc1. Booting stops right after
displaying "loading initial ramdisk..". No further displays.
Also nothing is wriiten to the logs.

First known bad kernel is: 4.16.0-12564-g6b0a02e
Last known good kernel is: 4.16.0-12548-g71b8ebb

Maybe the problem came in with:
6b0a02e:  "Merge branch 'x86-pti-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"


Thanks, Jörg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17  8:00 4.17.0-rc1 doesn't boot Jörg Otte
@ 2018-04-17  8:14 ` Borislav Petkov
  2018-04-17 14:16   ` Jörg Otte
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2018-04-17  8:14 UTC (permalink / raw)
  To: Jörg Otte; +Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List

On Tue, Apr 17, 2018 at 10:00:25AM +0200, Jörg Otte wrote:
> Maybe the problem came in with:
> 6b0a02e:  "Merge branch 'x86-pti-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"

Fetch latest Linus master and try again - there might be a relevant fix
there.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17  8:14 ` Borislav Petkov
@ 2018-04-17 14:16   ` Jörg Otte
  2018-04-17 14:27     ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Jörg Otte @ 2018-04-17 14:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List

2018-04-17 10:14 GMT+02:00 Borislav Petkov <bp@alien8.de>:
> On Tue, Apr 17, 2018 at 10:00:25AM +0200, Jörg Otte wrote:
>> Maybe the problem came in with:
>> 6b0a02e:  "Merge branch 'x86-pti-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"
>
> Fetch latest Linus master and try again - there might be a relevant fix
> there.
>

Current Linus master tree (4.17.0-rc1-00021-ga27fc14) does'nt fix it.

Thanks, Jörg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17 14:16   ` Jörg Otte
@ 2018-04-17 14:27     ` Borislav Petkov
  2018-04-17 15:21       ` Jörg Otte
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2018-04-17 14:27 UTC (permalink / raw)
  To: Jörg Otte; +Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List

On Tue, Apr 17, 2018 at 04:16:34PM +0200, Jörg Otte wrote:
> Current Linus master tree (4.17.0-rc1-00021-ga27fc14) does'nt fix it.

Then pls continue bisecting. Unless someone has a better idea...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17 14:27     ` Borislav Petkov
@ 2018-04-17 15:21       ` Jörg Otte
  2018-04-17 15:31         ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Jörg Otte @ 2018-04-17 15:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List

2018-04-17 16:27 GMT+02:00 Borislav Petkov <bp@alien8.de>:
> On Tue, Apr 17, 2018 at 04:16:34PM +0200, Jörg Otte wrote:
>> Current Linus master tree (4.17.0-rc1-00021-ga27fc14) does'nt fix it.
>
> Then pls continue bisecting. Unless someone has a better idea...
>

finished bisection.
39114b7a743e6759bab4d96b7d9651d44d17e3f9 is the first bad commit
(x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image).

Thanks, Jörg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17 15:21       ` Jörg Otte
@ 2018-04-17 15:31         ` Borislav Petkov
  2018-04-17 16:00           ` Mike Galbraith
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2018-04-17 15:31 UTC (permalink / raw)
  To: Jörg Otte
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List, Dave Hansen

On Tue, Apr 17, 2018 at 05:21:30PM +0200, Jörg Otte wrote:
> finished bisection.
> 39114b7a743e6759bab4d96b7d9651d44d17e3f9 is the first bad commit
> (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image).

Looks like you're not the only one:

http://marc.info/?i=20180417150130.GA11166@ak-laptop.emea.nsn-net.net

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17 15:31         ` Borislav Petkov
@ 2018-04-17 16:00           ` Mike Galbraith
  2018-04-17 16:48             ` Dave Hansen
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Galbraith @ 2018-04-17 16:00 UTC (permalink / raw)
  To: Borislav Petkov, Jörg Otte
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List, Dave Hansen

On Tue, 2018-04-17 at 17:31 +0200, Borislav Petkov wrote:
> On Tue, Apr 17, 2018 at 05:21:30PM +0200, Jörg Otte wrote:
> > finished bisection.
> > 39114b7a743e6759bab4d96b7d9651d44d17e3f9 is the first bad commit
> > (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image).
> 
> Looks like you're not the only one:
> 
> http://marc.info/?i=20180417150130.GA11166@ak-laptop.emea.nsn-net.net

I'm hitting this too, but only with PREEMPT_RT.  I put a bandaid on it
(tell pti_kernel_image_global_ok() to return true for PREEMPT_RT) while
waiting to see if it was really really as non-rt as it appeared to be.

	-Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17 16:00           ` Mike Galbraith
@ 2018-04-17 16:48             ` Dave Hansen
  2018-04-17 17:48               ` Mariusz Ceier
  2018-04-17 19:56               ` Dave Hansen
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Hansen @ 2018-04-17 16:48 UTC (permalink / raw)
  To: Mike Galbraith, Borislav Petkov, Jörg Otte
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List,
	Aaro Koskinen

On 04/17/2018 09:00 AM, Mike Galbraith wrote:
> On Tue, 2018-04-17 at 17:31 +0200, Borislav Petkov wrote:
>> On Tue, Apr 17, 2018 at 05:21:30PM +0200, Jörg Otte wrote:
>>> finished bisection.
>>> 39114b7a743e6759bab4d96b7d9651d44d17e3f9 is the first bad commit
>>> (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image).
>>
>> Looks like you're not the only one:
>>
>> http://marc.info/?i=20180417150130.GA11166@ak-laptop.emea.nsn-net.net
> 
> I'm hitting this too, but only with PREEMPT_RT.  I put a bandaid on it
> (tell pti_kernel_image_global_ok() to return true for PREEMPT_RT) while
> waiting to see if it was really really as non-rt as it appeared to be.

It looks like pti_init() is too early for
change_page_attr()/set_memory_nonglobal() because they look for
irqs_off().  This *should* be OK in practice because we only need to
flush the boot CPU, not the others.  That's what ends up causing the
BUG_ON().

But, there's apparently something else going on too because things don't
boot even with that BUG_ON() backed out.

The good news is that its easy to reproduce.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17 16:48             ` Dave Hansen
@ 2018-04-17 17:48               ` Mariusz Ceier
  2018-04-17 19:56               ` Dave Hansen
  1 sibling, 0 replies; 10+ messages in thread
From: Mariusz Ceier @ 2018-04-17 17:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Mike Galbraith, Borislav Petkov, Jörg Otte, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Aaro Koskinen

[-- Attachment #1: Type: text/plain, Size: 1750 bytes --]

On 17 April 2018 at 18:48, Dave Hansen <dave.hansen@linux.intel.com> wrote:
> On 04/17/2018 09:00 AM, Mike Galbraith wrote:
>> On Tue, 2018-04-17 at 17:31 +0200, Borislav Petkov wrote:
>>> On Tue, Apr 17, 2018 at 05:21:30PM +0200, Jörg Otte wrote:
>>>> finished bisection.
>>>> 39114b7a743e6759bab4d96b7d9651d44d17e3f9 is the first bad commit
>>>> (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image).
>>>
>>> Looks like you're not the only one:
>>>
>>> http://marc.info/?i=20180417150130.GA11166@ak-laptop.emea.nsn-net.net
>>
>> I'm hitting this too, but only with PREEMPT_RT.  I put a bandaid on it
>> (tell pti_kernel_image_global_ok() to return true for PREEMPT_RT) while
>> waiting to see if it was really really as non-rt as it appeared to be.
>
> It looks like pti_init() is too early for
> change_page_attr()/set_memory_nonglobal() because they look for
> irqs_off().  This *should* be OK in practice because we only need to
> flush the boot CPU, not the others.  That's what ends up causing the
> BUG_ON().
>
> But, there's apparently something else going on too because things don't
> boot even with that BUG_ON() backed out.
>
> The good news is that its easy to reproduce.

I'm hitting the same bug on my PC. Git bisect points to the same
commit (39114b7a743e6759bab4d96b7d9651d44d17e3f9).
Afaik I don't use PREEMPT_RT, unless CONFIG_PREEMPT=y is the same as PREEMPT_RT.
I'm attaching my .config. My CPU is i5 6600.

Kernel seems to work in qemu when -enable-kvm and -cpu host are
removed from the qemu-system-x86_64 command line from the other
mailthread.


PS. Resending mail as plain-text (hopefully, who knows what insane
gmail does), sorry for the previous mail (please ignore it).

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40087 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.17.0-rc1 doesn't boot.
  2018-04-17 16:48             ` Dave Hansen
  2018-04-17 17:48               ` Mariusz Ceier
@ 2018-04-17 19:56               ` Dave Hansen
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Hansen @ 2018-04-17 19:56 UTC (permalink / raw)
  To: Dave Hansen, Mike Galbraith, Borislav Petkov, Jörg Otte
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List,
	Aaro Koskinen

Heh, your .config is insidious:

ffffffff9ffe3000 B __brk_base
ffffffff9ffe3000 B __bss_stop
ffffffff9fff3000 b .brk.dmi_alloc
ffffffffa0003000 b .brk.early_pgt_alloc
ffffffffa000f000 B _end
ffffffffa000f000 B __brk_limit

dmi_alloc is __init, so it gets freed at some point and the PTEs zeroed
out.  That causes the warning when change_page_attr() sees the zero'd
PTE.  We just need to special-case the __init section along with the
linear map in pageattr.c.

I'll have some patches to do this shortly.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-04-17 19:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17  8:00 4.17.0-rc1 doesn't boot Jörg Otte
2018-04-17  8:14 ` Borislav Petkov
2018-04-17 14:16   ` Jörg Otte
2018-04-17 14:27     ` Borislav Petkov
2018-04-17 15:21       ` Jörg Otte
2018-04-17 15:31         ` Borislav Petkov
2018-04-17 16:00           ` Mike Galbraith
2018-04-17 16:48             ` Dave Hansen
2018-04-17 17:48               ` Mariusz Ceier
2018-04-17 19:56               ` Dave Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.