linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
       [not found]                 ` <20140129141822.GC24887@pd.tnic>
@ 2014-01-30 22:19                   ` Alex Thorlton
  2014-01-30 22:23                     ` H. Peter Anvin
  2014-01-31  8:04                     ` Matt Fleming
  0 siblings, 2 replies; 14+ messages in thread
From: Alex Thorlton @ 2014-01-30 22:19 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: H. Peter Anvin, Matt Fleming, Russ Anderson, linux-kernel

Re-adding lkml.

Here's what we've got as far as your questions go (snipped from an
e-mail from Russ):

<snip>
> * what kind of a pointer is that, physical address, or?

The quick answer is I think it is a virtual address, because                                                                                                                                                                                                                                                                 
it does not work in physical mode.  If you ever see "virtefi"                                                                                                                                                                                                                                                                
on the RHEL bootline it is because RH switched the default                                                                                                                                                                                                                                                                   
to physical mode, which caused UV to not boot.  "virtefi"                                                                                                                                                                                                                                                                    
forced it back to virtual mode.

> * does it get switched to a virtual address after
> SetVirtualAddressMap()?

I believe SetVirtualAddressMap() creates the virtual mappings,                                                                                                                                                                                                                                                               
but I would have to look at the code to be sure.

> * Which region in your UEFI map contains that function, what are the
> modalities about mapping it?

Not sure about the region.  The MMRs are in a uncached range                                                                                                                                                                                                                                                                 
and the functions are cached marked executable.  But have to                                                                                                                                                                                                                                                                 
double check on that.
</snip>

> * Anything else I should know?

Probably, but nothing that's jumping out at us right now :)

Let us know if you need anything else.

Thanks!

- Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-30 22:19                   ` [BUG] Linux 3.14 fails to boot with new EFI changes Alex Thorlton
@ 2014-01-30 22:23                     ` H. Peter Anvin
  2014-01-31 10:07                       ` Borislav Petkov
  2014-01-31  8:04                     ` Matt Fleming
  1 sibling, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2014-01-30 22:23 UTC (permalink / raw)
  To: Alex Thorlton, Borislav Petkov; +Cc: Matt Fleming, Russ Anderson, linux-kernel

On 01/30/2014 02:19 PM, Alex Thorlton wrote:
> 
> The quick answer is I think it is a virtual address, because                                                                                                                                                                                                                                                                 
> it does not work in physical mode.  If you ever see "virtefi"                                                                                                                                                                                                                                                                
> on the RHEL bootline it is because RH switched the default                                                                                                                                                                                                                                                                   
> to physical mode, which caused UV to not boot.  "virtefi"                                                                                                                                                                                                                                                                    
> forced it back to virtual mode.
> 

That is interesting, as it is definitely not the direction we have been
going in within the Linux community.

	-hpa



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-30 22:19                   ` [BUG] Linux 3.14 fails to boot with new EFI changes Alex Thorlton
  2014-01-30 22:23                     ` H. Peter Anvin
@ 2014-01-31  8:04                     ` Matt Fleming
  2014-01-31 13:53                       ` Russ Anderson
  1 sibling, 1 reply; 14+ messages in thread
From: Matt Fleming @ 2014-01-31  8:04 UTC (permalink / raw)
  To: Alex Thorlton
  Cc: Borislav Petkov, H. Peter Anvin, Russ Anderson, linux-kernel, linux-efi

On Thu, 30 Jan, at 04:19:50PM, Alex Thorlton wrote:
> Re-adding lkml.
 
Also add linux-efi.

> The quick answer is I think it is a virtual address, because
> it does not work in physical mode.  If you ever see "virtefi"
> on the RHEL bootline it is because RH switched the default
> to physical mode, which caused UV to not boot.  "virtefi"
> forced it back to virtual mode.

Do you have details of the failure, links to bug reports? Is it a
limitation of the firmware?

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-30 22:23                     ` H. Peter Anvin
@ 2014-01-31 10:07                       ` Borislav Petkov
  2014-01-31 14:02                         ` Russ Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2014-01-31 10:07 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Alex Thorlton, Matt Fleming, Russ Anderson, linux-kernel

On Thu, Jan 30, 2014 at 02:23:46PM -0800, H. Peter Anvin wrote:
> On 01/30/2014 02:19 PM, Alex Thorlton wrote:
> >
> > The quick answer is I think it is a virtual address, because it does
> > not work in physical mode. If you ever see "virtefi" on the RHEL
> > bootline it is because RH switched the default to physical mode,
> > which caused UV to not boot. "virtefi" forced it back to virtual
> > mode.
> > 
> That is interesting, as it is definitely not the direction we have been
> going in within the Linux community.

Right, for the new scheme to work, we'll have to map the region
containing the code for uv_systab->function in order to do all those
uv_bios_call()'s. Physical/virtual shouldn't matter all that much
because we map the region *both* as a 1:1 map and in virtual space too.

Can SGI please give us a reliable way to do that during boot?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-31  8:04                     ` Matt Fleming
@ 2014-01-31 13:53                       ` Russ Anderson
  0 siblings, 0 replies; 14+ messages in thread
From: Russ Anderson @ 2014-01-31 13:53 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Alex Thorlton, Borislav Petkov, H. Peter Anvin, linux-kernel, linux-efi

On Fri, Jan 31, 2014 at 08:04:28AM +0000, Matt Fleming wrote:
> On Thu, 30 Jan, at 04:19:50PM, Alex Thorlton wrote:
> > Re-adding lkml.
>  
> Also add linux-efi.
> 
> > The quick answer is I think it is a virtual address, because
> > it does not work in physical mode.  If you ever see "virtefi"
> > on the RHEL bootline it is because RH switched the default
> > to physical mode, which caused UV to not boot.  "virtefi"
> > forced it back to virtual mode.
> 
> Do you have details of the failure, links to bug reports? Is it a
> limitation of the firmware?

That was a non-upstream regression in the distro kernel.  The
3.13 community kernel was boots fine.  The current problem is a
regression introduced in this merge window which needs to be fixed.


-- 
Russ Anderson,  Kernel and Performance Software Team Manager
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-31 10:07                       ` Borislav Petkov
@ 2014-01-31 14:02                         ` Russ Anderson
  2014-01-31 14:23                           ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Russ Anderson @ 2014-01-31 14:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, Alex Thorlton, Matt Fleming, linux-kernel, linux-efi

On Fri, Jan 31, 2014 at 11:07:22AM +0100, Borislav Petkov wrote:
> On Thu, Jan 30, 2014 at 02:23:46PM -0800, H. Peter Anvin wrote:
> > On 01/30/2014 02:19 PM, Alex Thorlton wrote:
> > >
> > > The quick answer is I think it is a virtual address, because it does
> > > not work in physical mode. If you ever see "virtefi" on the RHEL
> > > bootline it is because RH switched the default to physical mode,
> > > which caused UV to not boot. "virtefi" forced it back to virtual
> > > mode.
> > > 
> > That is interesting, as it is definitely not the direction we have been
> > going in within the Linux community.
> 
> Right, for the new scheme to work, we'll have to map the region
> containing the code for uv_systab->function in order to do all those
> uv_bios_call()'s. Physical/virtual shouldn't matter all that much
> because we map the region *both* as a 1:1 map and in virtual space too.
> 
> Can SGI please give us a reliable way to do that during boot?

I'm not sure what you are asking for.  We had a reliable way to
boot before the recent patch broke it. (commit
d2f7cbe7b26a74dbbbf8f325b2a6fd01bc34032c)


-- 
Russ Anderson,  Kernel and Performance Software Team Manager
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-31 14:02                         ` Russ Anderson
@ 2014-01-31 14:23                           ` Borislav Petkov
  2014-01-31 14:36                             ` Borislav Petkov
  2014-02-05 21:45                             ` Alex Thorlton
  0 siblings, 2 replies; 14+ messages in thread
From: Borislav Petkov @ 2014-01-31 14:23 UTC (permalink / raw)
  To: Russ Anderson
  Cc: H. Peter Anvin, Alex Thorlton, Matt Fleming, linux-kernel, linux-efi

On Fri, Jan 31, 2014 at 08:02:21AM -0600, Russ Anderson wrote:
> I'm not sure what you are asking for. We had a reliable
> way to boot before the recent patch broke it. (commit
> d2f7cbe7b26a74dbbbf8f325b2a6fd01bc34032c)

So we should stop any further development just because your machines did
boot nicely before that. What about the other machines and kexec we're
fixing with the work above? Jeez...

Ok, let me give it in a more detailed fashion for ya:

1. uv_bios_init remaps the UV systab table which is at physical address
efi.uv_systab

2. Then, it copies it into uv_systab. The purpose of this is for you to
be able to call the uv_systab.function callback and thus call into your
firmware.

3. Now, uv_systab.function points to a region of code which contains
your BIOS callback, and that pointer is the entry point into that
function.

And now my question:

How can I reliably find out which region contains that
uv_systab.function call?

I need it so that I can map it in the EFI page table and you can
continue to call that function and you can get back to your reliable way
to boot.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-31 14:23                           ` Borislav Petkov
@ 2014-01-31 14:36                             ` Borislav Petkov
  2014-02-05 21:45                             ` Alex Thorlton
  1 sibling, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2014-01-31 14:36 UTC (permalink / raw)
  To: Russ Anderson
  Cc: H. Peter Anvin, Alex Thorlton, Matt Fleming, linux-kernel, linux-efi

On Fri, Jan 31, 2014 at 03:23:18PM +0100, Borislav Petkov wrote:
> So we should stop any further development just because your machines did
> boot nicely before that. What about the other machines and kexec we're
> fixing with the work above? Jeez...

Alternatively, we can force the old memmap method on UV. This solution
would be a lot less work for me and none for you so I'll be very much
fine with it.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-31 14:23                           ` Borislav Petkov
  2014-01-31 14:36                             ` Borislav Petkov
@ 2014-02-05 21:45                             ` Alex Thorlton
  2014-02-05 23:15                               ` Borislav Petkov
  1 sibling, 1 reply; 14+ messages in thread
From: Alex Thorlton @ 2014-02-05 21:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Russ Anderson, H. Peter Anvin, Matt Fleming, linux-kernel, linux-efi

On Fri, Jan 31, 2014 at 03:23:18PM +0100, Borislav Petkov wrote:
> And now my question:
> 
> How can I reliably find out which region contains that
> uv_systab.function call?
> 
> I need it so that I can map it in the EFI page table and you can
> continue to call that function and you can get back to your reliable way
> to boot.

While working on an answer to this question, I ran across another issue
on some newer hardware, that looks like it's definitely related to this
problem, and might be the root cause.

When booting on a UV2 we die in efi_enter_virtual_mode:

BUG: unable to handle kernel paging request at 0000008f7e848020
IP: [<000000007dadb6a9>] 0x7dadb6a8
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.11.0-medusa-00038-gd2f7cbe #821
Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
task: ffffffff817ae400 ti: ffffffff8179e000 task.ti: ffffffff8179e000
RIP: 0010:[<000000007dadb6a9>]  [<000000007dadb6a9>] 0x7dadb6a8
RSP: 0000:ffffffff8179fd90  EFLAGS: 00010202
RAX: 000000007d9b8e01 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000007db074f8 RSI: 000000007d9b8e18 RDI: 0000008f7e848000
RBP: 000000007db074f8 R08: 0000000000000001 R09: 0000008f7e848000
R10: 0000000000000030 R11: ffff880ffffda000 R12: 8000000000000000
R13: 000077ff80000000 R14: ffff888f7e848000 R15: 000000000009b000
FS:  0000000000000000(0000) GS:ffff880fffc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000008f7e848020 CR3: 000000000009b000 CR4: 00000000000406b0
Stack:
 fffffffef6000000 0000000000000000 0000000000000000 80000000000001e3
 0000000000000030 000000007dadd540 0000000000001f20 0000000060000202
 000000007d9b8da0 000000007daf8c6b ffffffff810fd2bc 00000000000000d0
Call Trace:
 [<ffffffff810fd2bc>] ? cache_grow+0x1e5/0x236
 [<ffffffff8103adac>] ? efi_call4+0x6c/0xf0
 [<ffffffff8186452f>] ? efi_enter_virtual_mode+0x1ac/0x328
 [<ffffffff8184de98>] ? start_kernel+0x35b/0x3ed
 [<ffffffff8184d950>] ? repair_env_string+0x60/0x60
 [<ffffffff8184d479>] ? x86_64_start_reservations+0x2e/0x30
 [<ffffffff8184d5a3>] ? x86_64_start_kernel+0x128/0x12f
Code:  Bad RIP value.
RIP  [<000000007dadb6a9>] 0x7dadb6a8
 RSP <ffffffff8179fd90>
CR2: 0000008f7e848020

Let me know what other information you need and I'll get it to you ASAP.

Thanks!

- Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-02-05 21:45                             ` Alex Thorlton
@ 2014-02-05 23:15                               ` Borislav Petkov
  2014-02-11 22:19                                 ` Alex Thorlton
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2014-02-05 23:15 UTC (permalink / raw)
  To: Alex Thorlton
  Cc: Russ Anderson, H. Peter Anvin, Matt Fleming, linux-kernel, linux-efi

On Wed, Feb 05, 2014 at 03:45:36PM -0600, Alex Thorlton wrote:
> While working on an answer to this question, I ran across another issue
> on some newer hardware, that looks like it's definitely related to this
> problem, and might be the root cause.
> 
> When booting on a UV2 we die in efi_enter_virtual_mode:
> 
> BUG: unable to handle kernel paging request at 0000008f7e848020
> IP: [<000000007dadb6a9>] 0x7dadb6a8
> PGD 0

That looks very much like this other issue we're debugging right now:

http://marc.info/?l=linux-kernel&m=139115794830637

> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.11.0-medusa-00038-gd2f7cbe #821

...

> Let me know what other information you need and I'll get it to you ASAP.

You could try to boot latest linus + Matt's 'next' branch ontop and see
whether it is exploding. I'd venture a guess and say yes but a whole
dmesg with CONFIG_EFI_PGT_DUMP and pagetable dump might still give us
some clues.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-02-05 23:15                               ` Borislav Petkov
@ 2014-02-11 22:19                                 ` Alex Thorlton
  2014-02-11 22:36                                   ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Thorlton @ 2014-02-11 22:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Russ Anderson, H. Peter Anvin, Matt Fleming, linux-kernel, linux-efi

On Thu, Feb 06, 2014 at 12:15:40AM +0100, Borislav Petkov wrote:
> On Wed, Feb 05, 2014 at 03:45:36PM -0600, Alex Thorlton wrote:
> > While working on an answer to this question, I ran across another issue
> > on some newer hardware, that looks like it's definitely related to this
> > problem, and might be the root cause.
> > 
> > When booting on a UV2 we die in efi_enter_virtual_mode:
> > 
> > BUG: unable to handle kernel paging request at 0000008f7e848020
> > IP: [<000000007dadb6a9>] 0x7dadb6a8
> > PGD 0
> 
> That looks very much like this other issue we're debugging right now:

Have there been any developments on this since last week, Boris?  Just
trying to make sure that we stay in the loop on this issue.

Let me know if there's anything else we can do from our end to help
expedite the process.  I'm always available to test new ideas.

Thanks!

- Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-02-11 22:19                                 ` Alex Thorlton
@ 2014-02-11 22:36                                   ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2014-02-11 22:36 UTC (permalink / raw)
  To: Alex Thorlton
  Cc: Russ Anderson, H. Peter Anvin, Matt Fleming, linux-kernel, linux-efi

On Tue, Feb 11, 2014 at 04:19:03PM -0600, Alex Thorlton wrote:
> Have there been any developments on this since last week, Boris? Just
> trying to make sure that we stay in the loop on this issue.
>
> Let me know if there's anything else we can do from our end to help
> expedite the process. I'm always available to test new ideas.

No change. The last failure I have received is the one below, which
basically explodes at the same place as the first one. In order to fix
that, I'd either

1) need from you the info about how to reliably find out which region
contains that uv_systab.function call and its size so that I can map it
1:1

or, if you don't want to share that information

2) let me know so that we can quirk UV2 to use the old EFI mapping code.
This is much easier as we would only need a reliable way to detect an
UV2 system at or before efi_enter_virtual_mode().

About http://marc.info/?l=linux-kernel&m=139115794830637 - I haven't
received any feedback yet after I asked them to test with Matt's
next branch. You did that on your machine and the explosion in
efi_enter_virtual_mode was fixed. But, the problem above still remains.

HTH.

smpboot: CPU0: Genuine Intel(R) CPU  @ 2.60GHz (fam: 06, model: 2d, stepping: 06)
UV: Found UV2 hub
------------[ cut here ]------------
kernel BUG at arch/x86/mm/init_64.c:351!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1-medusa-00099-g4532934-dirty #827
Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
task: ffff880ff9678010 ti: ffff880ff967a000 task.ti: ffff880ff967a000
RIP: 0010:[<ffffffff818b25a9>]  [<ffffffff818b25a9>] __init_extra_mapping+0x111/0x143
RSP: 0000:ffff880ff967bd18  EFLAGS: 00010206
RAX: 0000000000000f00 RBX: ffff880001c53018 RCX: 0000000000000002
RDX: ffff89ef7d83ef00 RSI: 0000000002000000 RDI: 00000000fc000000
RBP: ffff880ff967bd48 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000007dbfbc R11: 0000000000000000 R12: 00000000fc000000
R13: 0000000002000000 R14: ffff8800fc000000 R15: 0000000080000000
FS:  0000000000000000(0000) GS:ffff880fffc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff89ef7efff000 CR3: 00000000017df000 CR4: 00000000000406f0
Stack:
 80000000000001fb 0000000000000000 0000000000000100 000000000000b018
 000000000000b010 000000000000b008 ffff880ff967bd58 ffffffff818b25ee
 ffff880ff967be28 ffffffff818adcf6 ffff880fffc0cc40 0000000000000100
Call Trace:
 [<ffffffff818b25ee>] init_extra_mapping_uc+0x13/0x15
 [<ffffffff818adcf6>] uv_system_init+0x102/0x111d
 [<ffffffff8108c220>] ? clockevents_config_and_register+0x21/0x25
 [<ffffffff81028dc5>] ? setup_APIC_timer+0xbb/0xc7
 [<ffffffff81541910>] ? printk+0x72/0x74
 [<ffffffff818aba91>] ? setup_boot_APIC_clock+0x4a8/0x4b7
 [<ffffffff81541910>] ? printk+0x72/0x74
 [<ffffffff818a9756>] native_smp_prepare_cpus+0x389/0x3d6
 [<ffffffff8189d7bc>] kernel_init_freeable+0xb7/0x1fb
 [<ffffffff81539480>] ? rest_init+0x74/0x74
 [<ffffffff81539489>] kernel_init+0x9/0xd5
 [<ffffffff81545d7c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81539480>] ? rest_init+0x74/0x74
Code: ff ff ff 3f 00 00 48 23 13 48 b8 00 00 00 00 00 88 ff ff 48 01 c2 4c 89 e0 48 c1 e8 12 25 f8 0f 00 00 48 01 c2 48 83 3a 00 74 04
+<0f> 0b eb fe 48 8b 45 d0 49 81 ed 00 00 20 00 4c 09 e0 49 81 c4
RIP  [<ffffffff818b25a9>] __init_extra_mapping+0x111/0x143
 RSP <ffff880ff967bd18>
---[ end trace e093a3f084996fbc ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Linux 3.14 fails to boot with new EFI changes
  2014-01-23 22:11 Alex Thorlton
@ 2014-01-23 22:48 ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2014-01-23 22:48 UTC (permalink / raw)
  To: Alex Thorlton; +Cc: linux-kernel, Borislav Petkov, Matt Fleming

On Thu, Jan 23, 2014 at 04:11:08PM -0600, Alex Thorlton wrote:
> We've been hitting the following bug in the latest kernel, during boot:

Can you merge

git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git#next

into your tree, enable CONFIG_EFI_PGT_DUMP, apply the debugging patch
below, catch the whole dmesg and send it to me (privately is fine too)?

Thanks.

---
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f35c66c5959a..07712b9b4263 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -348,7 +348,12 @@ static void __init __init_extra_mapping(unsigned long phys, unsigned long size,
 						_PAGE_USER));
 		}
 		pmd = pmd_offset(pud, phys);
-		BUG_ON(!pmd_none(*pmd));
+		if (!pmd_none(*pmd)) {
+			pr_err("phys: 0x%lx, PGD: 0x%lx, PUD: 0x%lx, PMD: 0x%lx\n",
+				phys, pgd_val(*pgd), pud_val(*pud), pmd_val(*pmd));
+			BUG();
+		}
+
 		set_pmd(pmd, __pmd(phys | pgprot_val(prot)));
 	}
 }
--

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [BUG] Linux 3.14 fails to boot with new EFI changes
@ 2014-01-23 22:11 Alex Thorlton
  2014-01-23 22:48 ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Thorlton @ 2014-01-23 22:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Borislav Petkov, Matt Fleming

We've been hitting the following bug in the latest kernel, during boot:

kernel BUG at arch/x86/mm/init_64.c:351!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-medusa-04156-g90804ed-dirty #750
Hardware name: Intel Corp. Stoutland Platform, BIOS 2.0 UEFI2.10 PI1.0 X64 2013-09-16
task: ffff88107c96c010 ti: ffff88107c96e000 task.ti: ffff88107c96e000
RIP: 0010:[<ffffffff818aa6d8>]  [<ffffffff818aa6d8>] __init_extra_mapping+0x111/0x143
RSP: 0000:ffff88107c96fd18  EFLAGS: 00010206
RAX: 0000000000000e00 RBX: ffff880001c4a018 RCX: 0000000000000002
RDX: ffff88107fcd1e00 RSI: 0000000004000000 RDI: 00000000f8000000
RBP: ffff88107c96fd48 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000007e3b3c R11: 0000000000000000 R12: 00000000f8000000
R13: 0000000004000000 R14: ffff8800f8000000 R15: 0000000080000000
FS:  0000000000000000(0000) GS:ffff880073200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8827fefff000 CR3: 00000000017d7000 CR4: 00000000000006f0
Stack:
 80000000000001fb 0000000000000000 0000000000000040 000000000000b018
 000000000000b010 000000000000b008 ffff88107c96fd58 ffffffff818aa71d
 ffff88107c96fe28 ffffffff818a5e20 00000000000cb748 0000000000000282
Call Trace:
 [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
 [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
 [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
 [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
 [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
 [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
 [<ffffffff8153d955>] ? printk+0x72/0x74
 [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
 [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
 [<ffffffff81535530>] ? rest_init+0x74/0x74
 [<ffffffff81535539>] kernel_init+0x9/0xff
 [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
 [<ffffffff81535530>] ? rest_init+0x74/0x74
Code: ff ff ff 3f 00 00 48 23 13 48 b8 00 00 00 00 00 88 ff ff 48 01 c2 4c 89 e0 48 c1 e8 12 25 f8 0f 00 00 48 01 c2 48 83 3a 00 74 04 <0f> 0b eb fe 48 8b 45 d0 49 81 ed 00 00 20 00 4c 09 e0 49 81 c4
RIP  [<ffffffff818aa6d8>] __init_extra_mapping+0x111/0x143
 RSP <ffff88107c96fd18>

I've bisected the issue down to this commit:

commit d2f7cbe7b26a74dbbbf8f325b2a6fd01bc34032c                                                                                                                                                                                                                                                                              
Author: Borislav Petkov <bp@suse.de>                                                                                                                                                                                                                                                                                         
Date:   Thu Oct 31 17:25:08 2013 +0100                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                             
    x86/efi: Runtime services virtual mapping                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                             
    We map the EFI regions needed for runtime services non-contiguously,                                                                                                                                                                                                                                                     
    with preserved alignment on virtual addresses starting from -4G down                                                                                                                                                                                                                                                     
    for a total max space of 64G. This way, we provide for stable runtime                                                                                                                                                                                                                                                    
    services addresses across kernels so that a kexec'd kernel can still use                                                                                                                                                                                                                                                 
    them.                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                             
    Thus, they're mapped in a separate pagetable so that we don't pollute                                                                                                                                                                                                                                                    
    the kernel namespace.                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                             
    Add an efi= kernel command line parameter for passing miscellaneous                                                                                                                                                                                                                                                      
    options and chicken bits from the command line.                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                             
    While at it, add a chicken bit called "efi=old_map" which can be used as                                                                                                                                                                                                                                                 
    a fallback to the old runtime services mapping method in case there's                                                                                                                                                                                                                                                    
    some b0rkage with a particular EFI implementation (haha, it is hard to                                                                                                                                                                                                                                                   
    hold up the sarcasm here...).                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                             
    Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt.                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                             
    Signed-off-by: Borislav Petkov <bp@suse.de>                                                                                                                                                                                                                                                                              
    Signed-off-by: Matt Fleming <matt.fleming@intel.com>

Running with the efi=old_map does seem to remedy the problem.  One
solution, proposed by Mike Travis (travis@sgi.com), is to switch the
behavior over so that you have to provide a command line parameter to
activate the new behavior, instead of one to deactivate it.

Any input on the issue would be greatly appreciated.  Thanks!

- Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-02-11 22:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <52E2ADB1.2030007@zytor.com>
     [not found] ` <20140124183730.GC11788@pd.tnic>
     [not found]   ` <20140124184842.GD11788@pd.tnic>
     [not found]     ` <20140124191709.GT18196@sgi.com>
     [not found]       ` <20140127222129.GK6839@pd.tnic>
     [not found]         ` <20140128110552.GA815@pd.tnic>
     [not found]           ` <20140128200754.GZ18196@sgi.com>
     [not found]             ` <20140128225905.GN815@pd.tnic>
     [not found]               ` <20140128234036.GB18196@sgi.com>
     [not found]                 ` <20140129141822.GC24887@pd.tnic>
2014-01-30 22:19                   ` [BUG] Linux 3.14 fails to boot with new EFI changes Alex Thorlton
2014-01-30 22:23                     ` H. Peter Anvin
2014-01-31 10:07                       ` Borislav Petkov
2014-01-31 14:02                         ` Russ Anderson
2014-01-31 14:23                           ` Borislav Petkov
2014-01-31 14:36                             ` Borislav Petkov
2014-02-05 21:45                             ` Alex Thorlton
2014-02-05 23:15                               ` Borislav Petkov
2014-02-11 22:19                                 ` Alex Thorlton
2014-02-11 22:36                                   ` Borislav Petkov
2014-01-31  8:04                     ` Matt Fleming
2014-01-31 13:53                       ` Russ Anderson
2014-01-23 22:11 Alex Thorlton
2014-01-23 22:48 ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).