linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* instant reboot caused by 194a9749c73d650c0
@ 2018-04-15  4:39 Eric Dumazet
  2018-04-16  6:07 ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2018-04-15  4:39 UTC (permalink / raw)
  To: Kirill A. Shutemov, LKML, Ingo Molnar, Linus Torvalds

Hi Kirill

For some reason, my hosts instantly crash at boot time, with absolutely no log on console.

Bisection pointed to :

$ git bisect bad
194a9749c73d650c0b1dfdee04fb0bdf0a888ba8 is the first bad commit
commit 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Mon Mar 12 13:02:46 2018 +0300

    x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
    
    This patch addresses a shortcoming in current boot process on machines
    that supports 5-level paging.
    
    If a bootloader enables 64-bit mode with 4-level paging, we might need to
    switch over to 5-level paging. The switching requires the disabling
    paging. It works fine if kernel itself is loaded below 4G.
    
    But if the bootloader put the kernel above 4G (not sure if anybody does
    this), we would lose control as soon as paging is disabled, because the
    code becomes unreachable to the CPU.
    
    This patch implements a trampoline in lower memory to handle this
    situation.
    
    We only need the memory for a very short time, until the main kernel
    image sets up own page tables.
    
    We go through the trampoline even if we don't have to: if we're already
    in 5-level paging mode or if we don't need to switch to it. This way the
    trampoline gets tested on every boot.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Cyrill Gorcunov <gorcunov@openvz.org>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20180312100246.89175-5-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>


Reverting this patch solves the problem for me.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: instant reboot caused by 194a9749c73d650c0
  2018-04-15  4:39 instant reboot caused by 194a9749c73d650c0 Eric Dumazet
@ 2018-04-16  6:07 ` Ingo Molnar
  2018-04-16  9:15   ` Kirill A. Shutemov
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2018-04-16  6:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kirill A. Shutemov, LKML, Linus Torvalds, Thomas Gleixner


* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Hi Kirill
> 
> For some reason, my hosts instantly crash at boot time, with absolutely no log on console.
> 
> Bisection pointed to :
> 
> $ git bisect bad
> 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8 is the first bad commit
> commit 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8
> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Date:   Mon Mar 12 13:02:46 2018 +0300
> 
>     x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G

Could you please send your .config? These early boot problems are sometimes build 
and Kconfig environment sensitive.

A high level description of your hardware and the distro you are using would also 
be useful.

Kirill, I'm curious about this change:

-       /* Calculate address we are running at */
-       call    1f
-1:     popl    %edi
-       subl    $1b, %edi
+       /* Calculate address of paging_enabled() once we are executing in the 
trampoline */
+       leal    paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax

Here we change the calculation from a "discover where we are executing" method to 
a calculation method (which is fundamentally more fragile) - why?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: instant reboot caused by 194a9749c73d650c0
  2018-04-16  6:07 ` Ingo Molnar
@ 2018-04-16  9:15   ` Kirill A. Shutemov
  2018-04-16 15:36     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-04-16  9:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Kirill A. Shutemov, LKML, Linus Torvalds, Thomas Gleixner

On Mon, Apr 16, 2018 at 08:07:09AM +0200, Ingo Molnar wrote:
> 
> * Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Hi Kirill
> > 
> > For some reason, my hosts instantly crash at boot time, with absolutely no log on console.
> > 
> > Bisection pointed to :
> > 
> > $ git bisect bad
> > 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8 is the first bad commit
> > commit 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8
> > Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Date:   Mon Mar 12 13:02:46 2018 +0300
> > 
> >     x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
> 
> Could you please send your .config? These early boot problems are sometimes build 
> and Kconfig environment sensitive.
> 
> A high level description of your hardware and the distro you are using would also 
> be useful.

And how do you start the kernel? EFI? Legacy boot? kexec?

> 
> Kirill, I'm curious about this change:
> 
> -       /* Calculate address we are running at */
> -       call    1f
> -1:     popl    %edi
> -       subl    $1b, %edi
> +       /* Calculate address of paging_enabled() once we are executing in the 
> trampoline */
> +       leal    paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
> 
> Here we change the calculation from a "discover where we are executing" method to 
> a calculation method (which is fundamentally more fragile) - why?

I guess, I tried to save one register -- %rdi is used for return from
trampoline.

But you're right there's no reason to do this and it may be more fragile.

Eric, could you check if the patch makes any difference?

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fca012baba19..395c122ef70b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -562,7 +562,10 @@ ENTRY(trampoline_32bit_src)
 	movl	%eax, %cr4
 
 	/* Calculate address of paging_enabled() once we are executing in the trampoline */
-	leal	paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
+	call	1f
+1:	popl	%eax
+	subl	$1b, %eax
+	leal	paging_enabled(%eax), %eax
 
 	/* Prepare the stack for far return to Long Mode */
 	pushl	$__KERNEL_CS
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 91f75638f6e6..6ff7e81b5628 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -6,7 +6,7 @@
 #define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
 
 #define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE	0x60
+#define TRAMPOLINE_32BIT_CODE_SIZE	0x70
 
 #define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
 
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: instant reboot caused by 194a9749c73d650c0
  2018-04-16  9:15   ` Kirill A. Shutemov
@ 2018-04-16 15:36     ` Eric Dumazet
  2018-04-16 17:37       ` Kirill A. Shutemov
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2018-04-16 15:36 UTC (permalink / raw)
  To: Kirill A. Shutemov, Ingo Molnar
  Cc: Kirill A. Shutemov, LKML, Linus Torvalds, Thomas Gleixner, Greg Thelen



On 04/16/2018 02:15 AM, Kirill A. Shutemov wrote:
> On Mon, Apr 16, 2018 at 08:07:09AM +0200, Ingo Molnar wrote:
>>
>> * Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> Hi Kirill
>>>
>>> For some reason, my hosts instantly crash at boot time, with absolutely no log on console.
>>>
>>> Bisection pointed to :
>>>
>>> $ git bisect bad
>>> 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8 is the first bad commit
>>> commit 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8
>>> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Date:   Mon Mar 12 13:02:46 2018 +0300
>>>
>>>     x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
>>
>> Could you please send your .config? These early boot problems are sometimes build 
>> and Kconfig environment sensitive.
>>
>> A high level description of your hardware and the distro you are using would also 
>> be useful.
> 
> And how do you start the kernel? EFI? Legacy boot? kexec?
> 
>>
>> Kirill, I'm curious about this change:
>>
>> -       /* Calculate address we are running at */
>> -       call    1f
>> -1:     popl    %edi
>> -       subl    $1b, %edi
>> +       /* Calculate address of paging_enabled() once we are executing in the 
>> trampoline */
>> +       leal    paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
>>
>> Here we change the calculation from a "discover where we are executing" method to 
>> a calculation method (which is fundamentally more fragile) - why?
> 
> I guess, I tried to save one register -- %rdi is used for return from
> trampoline.
> 
> But you're right there's no reason to do this and it may be more fragile.
> 
> Eric, could you check if the patch makes any difference?
> 
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index fca012baba19..395c122ef70b 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -562,7 +562,10 @@ ENTRY(trampoline_32bit_src)
>  	movl	%eax, %cr4
>  
>  	/* Calculate address of paging_enabled() once we are executing in the trampoline */
> -	leal	paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
> +	call	1f
> +1:	popl	%eax
> +	subl	$1b, %eax
> +	leal	paging_enabled(%eax), %eax
>  
>  	/* Prepare the stack for far return to Long Mode */
>  	pushl	$__KERNEL_CS
> diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
> index 91f75638f6e6..6ff7e81b5628 100644
> --- a/arch/x86/boot/compressed/pgtable.h
> +++ b/arch/x86/boot/compressed/pgtable.h
> @@ -6,7 +6,7 @@
>  #define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
>  
>  #define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
> -#define TRAMPOLINE_32BIT_CODE_SIZE	0x60
> +#define TRAMPOLINE_32BIT_CODE_SIZE	0x70
>  
>  #define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
>  
> 

Hi Kirill

This patch did not help.

In the mean time, Greg told me that using gcc-4.9 instead of our old gcc-4.7 based toolchain was working better.

Thanks !

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: instant reboot caused by 194a9749c73d650c0
  2018-04-16 15:36     ` Eric Dumazet
@ 2018-04-16 17:37       ` Kirill A. Shutemov
  2018-04-16 18:16         ` Kirill A. Shutemov
  0 siblings, 1 reply; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-04-16 17:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Kirill A. Shutemov, LKML, Linus Torvalds,
	Thomas Gleixner, Greg Thelen

On Mon, Apr 16, 2018 at 08:36:57AM -0700, Eric Dumazet wrote:
> 
> 
> On 04/16/2018 02:15 AM, Kirill A. Shutemov wrote:
> In the mean time, Greg told me that using gcc-4.9 instead of our old
> gcc-4.7 based toolchain was working better.

Hm. Are you saing that switching from gcc-4.7 to gcc-4.9 fixed the issue
for you? That's worth investigating.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: instant reboot caused by 194a9749c73d650c0
  2018-04-16 17:37       ` Kirill A. Shutemov
@ 2018-04-16 18:16         ` Kirill A. Shutemov
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-04-16 18:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Kirill A. Shutemov, LKML, Linus Torvalds,
	Thomas Gleixner, Greg Thelen

On Mon, Apr 16, 2018 at 08:37:52PM +0300, Kirill A. Shutemov wrote:
> On Mon, Apr 16, 2018 at 08:36:57AM -0700, Eric Dumazet wrote:
> > 
> > 
> > On 04/16/2018 02:15 AM, Kirill A. Shutemov wrote:
> > In the mean time, Greg told me that using gcc-4.9 instead of our old
> > gcc-4.7 based toolchain was working better.
> 
> Hm. Are you saing that switching from gcc-4.7 to gcc-4.9 fixed the issue
> for you? That's worth investigating.

Checked kernel build with gcc-4.6 and it boots fine...

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-17  3:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-15  4:39 instant reboot caused by 194a9749c73d650c0 Eric Dumazet
2018-04-16  6:07 ` Ingo Molnar
2018-04-16  9:15   ` Kirill A. Shutemov
2018-04-16 15:36     ` Eric Dumazet
2018-04-16 17:37       ` Kirill A. Shutemov
2018-04-16 18:16         ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).