* [PATCH] x86: setup: extend low identity map to cover whole kernel range @ 2015-10-14 11:30 Paolo Bonzini 2015-10-14 13:52 ` Matt Fleming 0 siblings, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2015-10-14 11:30 UTC (permalink / raw) To: linux-kernel; +Cc: hpa, x86, stable, lersek, matt.fleming, bp, linux-efi On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 000000ff IDT= fffbb000 000007ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 (gdb) x/4wx 0x3399000 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x00000000 Reported-by: Laszlo Ersek <lersek@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kernel/setup.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 80f874bf999e..24154bd12307 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1198,6 +1198,14 @@ void __init setup_arch(char **cmdline_p) clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, swapper_pg_dir + KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS); + + /* + * sync back low identity map too. It is used for example + * in the 32-bit EFI stub. + */ + clone_pgd_range(initial_page_table, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); #endif tboot_probe(); -- 2.5.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range 2015-10-14 11:30 [PATCH] x86: setup: extend low identity map to cover whole kernel range Paolo Bonzini @ 2015-10-14 13:52 ` Matt Fleming 2015-10-14 14:29 ` Paolo Bonzini [not found] ` <20151014135211.GB2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 0 siblings, 2 replies; 9+ messages in thread From: Matt Fleming @ 2015-10-14 13:52 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel, hpa, x86, stable, lersek, matt.fleming, bp, linux-efi, Andy Lutomirski (Pulling in luto for low-level x86 fu) On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: > On 32-bit systems, the initial_page_table is reused by > efi_call_phys_prolog as an identity map to call > SetVirtualAddressMap. efi_call_phys_prolog takes care of > converting the current CPU's GDT to a physical address too. > > For PAE kernels the identity mapping is achieved by aliasing the > first PDPE for the kernel memory mapping into the first PDPE > of initial_page_table. This makes the EFI stub's trick "just work". > > However, for non-PAE kernels there is no guarantee that the identity > mapping in the initial_page_table extends as far as the GDT; in this > case, accesses to the GDT will cause a page fault (which quickly becomes > a triple fault). Fix this by copying the kernel mappings from > swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at > identity mapping. Oops, good catch guys. This is clearly a bug, but... > For some reason, this is only reproducible with QEMU's dynamic translation > mode, and not for example with KVM. However, even under KVM one can clearly > see that the page table is bogus: > > $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize > $ gdb > (gdb) target remote localhost:1234 > (gdb) hb *0x02858f6f > Hardware assisted breakpoint 1 at 0x2858f6f > (gdb) c > Continuing. > > Breakpoint 1, 0x02858f6f in ?? () > (gdb) monitor info registers > ... > GDT= 0724e000 000000ff > IDT= fffbb000 000007ff > CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 > ... > > The page directory is sane: > > (gdb) x/4wx 0x32b7000 > 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 > (gdb) x/4wx 0x3398000 > 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 > (gdb) x/4wx 0x3399000 > 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 > > but our particular page directory entry is empty: > > (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 > 0x32b7070: 0x00000000 ... I'm a little surprised you managed to trigger this at all, because the GDT we load in efi_call_phys_prolog() is part of the per-cpu data section and therefore part of the kernel image. The kernel image *is* mapped in the identity range, even for non-PAE kernels. So yes, you're right this is a bug but I'm not sure how you're actually triggering it. > Reported-by: Laszlo Ersek <lersek@redhat.com> > Cc: stable@vger.kernel.org > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kernel/setup.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 80f874bf999e..24154bd12307 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -1198,6 +1198,14 @@ void __init setup_arch(char **cmdline_p) > clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, > swapper_pg_dir + KERNEL_PGD_BOUNDARY, > KERNEL_PGD_PTRS); > + > + /* > + * sync back low identity map too. It is used for example > + * in the 32-bit EFI stub. > + */ > + clone_pgd_range(initial_page_table, > + swapper_pg_dir + KERNEL_PGD_BOUNDARY, > + KERNEL_PGD_PTRS); > #endif > > tboot_probe(); > -- > 2.5.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-efi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range 2015-10-14 13:52 ` Matt Fleming @ 2015-10-14 14:29 ` Paolo Bonzini 2015-10-14 21:04 ` Matt Fleming [not found] ` <20151014135211.GB2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 1 sibling, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2015-10-14 14:29 UTC (permalink / raw) To: Matt Fleming Cc: linux-kernel, hpa, x86, stable, lersek, matt.fleming, bp, linux-efi, Andy Lutomirski On 14/10/2015 15:52, Matt Fleming wrote: >> > However, for non-PAE kernels there is no guarantee that the identity >> > mapping in the initial_page_table extends as far as the GDT; in this >> > case, accesses to the GDT will cause a page fault (which quickly becomes >> > a triple fault). Fix this by copying the kernel mappings from >> > swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at >> > identity mapping. > > Oops, good catch guys. This is clearly a bug, but... > > ... I'm a little surprised you managed to trigger this at all, because > the GDT we load in efi_call_phys_prolog() is part of the per-cpu data > section and therefore part of the kernel image. Only until setup_percpu, which is earlier than SetVirtualAddressMap. For example, I get: setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:1 nr_node_ids:1 PERCPU: Embedded 18 pages/cpu @c728e000 s41800 r0 d31928 u73728 ^^^^^^^ but the kernel image ends at 0x037fffff. The GDT is 0xc728e000 in this run, so the GDT is at the beginning of the relocated percpu area. In the above run, the FS base that switch_to_new_gdt loads is 0x551C000. You have 0x728E000 - 0x551C000 = 0x1D72000, and from tracing I see that one of the GDT values that is loaded very early is exactly 0xC1D72000. That _is_ inside the kernel image of course when you remove PAGE_OFFSET. Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range 2015-10-14 14:29 ` Paolo Bonzini @ 2015-10-14 21:04 ` Matt Fleming 0 siblings, 0 replies; 9+ messages in thread From: Matt Fleming @ 2015-10-14 21:04 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel, hpa, x86, stable, lersek, matt.fleming, bp, linux-efi, Andy Lutomirski On Wed, 14 Oct, at 04:29:33PM, Paolo Bonzini wrote: > > On 14/10/2015 15:52, Matt Fleming wrote: > >> > However, for non-PAE kernels there is no guarantee that the identity > >> > mapping in the initial_page_table extends as far as the GDT; in this > >> > case, accesses to the GDT will cause a page fault (which quickly becomes > >> > a triple fault). Fix this by copying the kernel mappings from > >> > swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at > >> > identity mapping. > > > > Oops, good catch guys. This is clearly a bug, but... > > > > ... I'm a little surprised you managed to trigger this at all, because > > the GDT we load in efi_call_phys_prolog() is part of the per-cpu data > > section and therefore part of the kernel image. > > Only until setup_percpu, which is earlier than SetVirtualAddressMap. > For example, I get: > > setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:1 nr_node_ids:1 > PERCPU: Embedded 18 pages/cpu @c728e000 s41800 r0 d31928 u73728 > ^^^^^^^ > but the kernel image ends at 0x037fffff. > > The GDT is 0xc728e000 in this run, so the GDT is at the beginning of the > relocated percpu area. Ah, good point. I completely missed that the percpu sections get relocated. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20151014135211.GB2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>]
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range [not found] ` <20151014135211.GB2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-14 16:22 ` Andy Lutomirski 2015-10-14 21:00 ` Matt Fleming 0 siblings, 1 reply; 9+ messages in thread From: Andy Lutomirski @ 2015-10-14 16:22 UTC (permalink / raw) To: Matt Fleming Cc: Paolo Bonzini, linux-kernel-u79uwXL29TY76Z2rM5mHXA, H. Peter Anvin, X86 ML, stable, Laszlo Ersek, Matt Fleming, Borislav Petkov, linux-efi-u79uwXL29TY76Z2rM5mHXA On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > (Pulling in luto for low-level x86 fu) > > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: >> On 32-bit systems, the initial_page_table is reused by >> efi_call_phys_prolog as an identity map to call >> SetVirtualAddressMap. efi_call_phys_prolog takes care of >> converting the current CPU's GDT to a physical address too. >> >> For PAE kernels the identity mapping is achieved by aliasing the >> first PDPE for the kernel memory mapping into the first PDPE >> of initial_page_table. This makes the EFI stub's trick "just work". >> >> However, for non-PAE kernels there is no guarantee that the identity >> mapping in the initial_page_table extends as far as the GDT; in this >> case, accesses to the GDT will cause a page fault (which quickly becomes >> a triple fault). Fix this by copying the kernel mappings from >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at >> identity mapping. > > Oops, good catch guys. This is clearly a bug, but... > >> For some reason, this is only reproducible with QEMU's dynamic translation >> mode, and not for example with KVM. However, even under KVM one can clearly >> see that the page table is bogus: I haven't looked at the code, but it wouldn't surprise me if this is some kind of TLB issue. With the hardware TLB (which is in use on KVM), it seems quite likely that the GDT is pretty much always in the TLB and, if nothing flushes global mappings, then it'll probably stick around. --Andy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range 2015-10-14 16:22 ` Andy Lutomirski @ 2015-10-14 21:00 ` Matt Fleming [not found] ` <20151014210050.GE2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Matt Fleming @ 2015-10-14 21:00 UTC (permalink / raw) To: Andy Lutomirski Cc: Paolo Bonzini, linux-kernel, H. Peter Anvin, X86 ML, stable, Laszlo Ersek, Matt Fleming, Borislav Petkov, linux-efi On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote: > On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming <matt@codeblueprint.co.uk> wrote: > > (Pulling in luto for low-level x86 fu) > > > > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: > >> On 32-bit systems, the initial_page_table is reused by > >> efi_call_phys_prolog as an identity map to call > >> SetVirtualAddressMap. efi_call_phys_prolog takes care of > >> converting the current CPU's GDT to a physical address too. > >> > >> For PAE kernels the identity mapping is achieved by aliasing the > >> first PDPE for the kernel memory mapping into the first PDPE > >> of initial_page_table. This makes the EFI stub's trick "just work". > >> > >> However, for non-PAE kernels there is no guarantee that the identity > >> mapping in the initial_page_table extends as far as the GDT; in this > >> case, accesses to the GDT will cause a page fault (which quickly becomes > >> a triple fault). Fix this by copying the kernel mappings from > >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at > >> identity mapping. > > > > Oops, good catch guys. This is clearly a bug, but... > > > >> For some reason, this is only reproducible with QEMU's dynamic translation > >> mode, and not for example with KVM. However, even under KVM one can clearly > >> see that the page table is bogus: > > I haven't looked at the code, but it wouldn't surprise me if this is > some kind of TLB issue. With the hardware TLB (which is in use on > KVM), it seems quite likely that the GDT is pretty much always in the > TLB and, if nothing flushes global mappings, then it'll probably stick > around. >From some quick experiments it appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Which is interesting because I assumed that writing to GDTR took immediate effect. Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. That does explain why no one has complained about this issue before. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20151014210050.GE2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>]
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range [not found] ` <20151014210050.GE2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-14 21:39 ` Andy Lutomirski 2015-10-15 9:45 ` Matt Fleming [not found] ` <CALCETrU=YL8yWpp29xO0N7TEVogX1j5Fyk5M_FpJTa9ZOS21Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 2 replies; 9+ messages in thread From: Andy Lutomirski @ 2015-10-14 21:39 UTC (permalink / raw) To: Matt Fleming Cc: Paolo Bonzini, linux-kernel-u79uwXL29TY76Z2rM5mHXA, H. Peter Anvin, X86 ML, stable, Laszlo Ersek, Matt Fleming, Borislav Petkov, linux-efi-u79uwXL29TY76Z2rM5mHXA On Wed, Oct 14, 2015 at 2:00 PM, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote: >> On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: >> > (Pulling in luto for low-level x86 fu) >> > >> > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: >> >> On 32-bit systems, the initial_page_table is reused by >> >> efi_call_phys_prolog as an identity map to call >> >> SetVirtualAddressMap. efi_call_phys_prolog takes care of >> >> converting the current CPU's GDT to a physical address too. >> >> >> >> For PAE kernels the identity mapping is achieved by aliasing the >> >> first PDPE for the kernel memory mapping into the first PDPE >> >> of initial_page_table. This makes the EFI stub's trick "just work". >> >> >> >> However, for non-PAE kernels there is no guarantee that the identity >> >> mapping in the initial_page_table extends as far as the GDT; in this >> >> case, accesses to the GDT will cause a page fault (which quickly becomes >> >> a triple fault). Fix this by copying the kernel mappings from >> >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at >> >> identity mapping. >> > >> > Oops, good catch guys. This is clearly a bug, but... >> > >> >> For some reason, this is only reproducible with QEMU's dynamic translation >> >> mode, and not for example with KVM. However, even under KVM one can clearly >> >> see that the page table is bogus: >> >> I haven't looked at the code, but it wouldn't surprise me if this is >> some kind of TLB issue. With the hardware TLB (which is in use on >> KVM), it seems quite likely that the GDT is pretty much always in the >> TLB and, if nothing flushes global mappings, then it'll probably stick >> around. > > From some quick experiments it appears that you can skate past this > issue if you don't receive any interrupts while the bogus GDT pointer > is loaded, or if you avoid reloading the segment registers in general. > Which is interesting because I assumed that writing to GDTR took > immediate effect. Trivia for your amusement: AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT. Xen is nuts^Wclever and actually uses this. Of course, if your #PF vector references a GDT or LDT descriptor and trying to load that descriptor results in a page fault, you get a double fault. I learned this while trying to puzzle out why v1 of my LDT synchronization patch caused random faults on Xen. --Andy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range 2015-10-14 21:39 ` Andy Lutomirski @ 2015-10-15 9:45 ` Matt Fleming [not found] ` <CALCETrU=YL8yWpp29xO0N7TEVogX1j5Fyk5M_FpJTa9ZOS21Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 0 replies; 9+ messages in thread From: Matt Fleming @ 2015-10-15 9:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Paolo Bonzini, linux-kernel, H. Peter Anvin, X86 ML, stable, Laszlo Ersek, Matt Fleming, Borislav Petkov, linux-efi On Wed, 14 Oct, at 02:39:58PM, Andy Lutomirski wrote: > > Trivia for your amusement: > > AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to > point to unmapped memory. Any attempt to use them (segment loads, > interrupts, IRET, etc) will try to access that memory as if the access > came from CPL 0 and, if the access fails, will generate a valid page > fault with CR2 pointing into the GDT or LDT. > > Xen is nuts^Wclever and actually uses this. > > Of course, if your #PF vector references a GDT or LDT descriptor and > trying to load that descriptor results in a page fault, you get a > double fault. > > I learned this while trying to puzzle out why v1 of my LDT > synchronization patch caused random faults on Xen. Ha, interesting! Thanks Andy, it's good to get confirmation. OK, I think we understand this issue well enough to call this fixed. I've queued the following patch up in my urgent tree and I'll send a pull request to the tip folks tomorrow. --- >From 72ac9f242fad7c3bf3e06461e34c9a546d8cb411 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Wed, 14 Oct 2015 13:30:45 +0200 Subject: [PATCH] x86/setup: Extend low identity map to cover whole kernel range On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 000000ff IDT= fffbb000 000007ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 (gdb) x/4wx 0x3399000 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x00000000 [ It appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Andy Lutomirski provides some additional insight: "AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT." Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. Which explains why no one has hit this issue until now. ] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [ Updated changelog. ] --- arch/x86/kernel/setup.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index fdb7f2a2d328..a3cccbfc5f77 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1173,6 +1173,14 @@ void __init setup_arch(char **cmdline_p) clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, swapper_pg_dir + KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS); + + /* + * sync back low identity map too. It is used for example + * in the 32-bit EFI stub. + */ + clone_pgd_range(initial_page_table, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); #endif tboot_probe(); -- 2.1.0 -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <CALCETrU=YL8yWpp29xO0N7TEVogX1j5Fyk5M_FpJTa9ZOS21Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range [not found] ` <CALCETrU=YL8yWpp29xO0N7TEVogX1j5Fyk5M_FpJTa9ZOS21Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-10-15 12:18 ` H. Peter Anvin 0 siblings, 0 replies; 9+ messages in thread From: H. Peter Anvin @ 2015-10-15 12:18 UTC (permalink / raw) To: Andy Lutomirski, Matt Fleming Cc: Paolo Bonzini, linux-kernel-u79uwXL29TY76Z2rM5mHXA, X86 ML, stable, Laszlo Ersek, Matt Fleming, Borislav Petkov, linux-efi-u79uwXL29TY76Z2rM5mHXA On October 14, 2015 2:39:58 PM PDT, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: >On Wed, Oct 14, 2015 at 2:00 PM, Matt Fleming ><matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: >> On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote: >>> On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming ><matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: >>> > (Pulling in luto for low-level x86 fu) >>> > >>> > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: >>> >> On 32-bit systems, the initial_page_table is reused by >>> >> efi_call_phys_prolog as an identity map to call >>> >> SetVirtualAddressMap. efi_call_phys_prolog takes care of >>> >> converting the current CPU's GDT to a physical address too. >>> >> >>> >> For PAE kernels the identity mapping is achieved by aliasing the >>> >> first PDPE for the kernel memory mapping into the first PDPE >>> >> of initial_page_table. This makes the EFI stub's trick "just >work". >>> >> >>> >> However, for non-PAE kernels there is no guarantee that the >identity >>> >> mapping in the initial_page_table extends as far as the GDT; in >this >>> >> case, accesses to the GDT will cause a page fault (which quickly >becomes >>> >> a triple fault). Fix this by copying the kernel mappings from >>> >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET >and at >>> >> identity mapping. >>> > >>> > Oops, good catch guys. This is clearly a bug, but... >>> > >>> >> For some reason, this is only reproducible with QEMU's dynamic >translation >>> >> mode, and not for example with KVM. However, even under KVM one >can clearly >>> >> see that the page table is bogus: >>> >>> I haven't looked at the code, but it wouldn't surprise me if this is >>> some kind of TLB issue. With the hardware TLB (which is in use on >>> KVM), it seems quite likely that the GDT is pretty much always in >the >>> TLB and, if nothing flushes global mappings, then it'll probably >stick >>> around. >> >> From some quick experiments it appears that you can skate past this >> issue if you don't receive any interrupts while the bogus GDT pointer >> is loaded, or if you avoid reloading the segment registers in >general. >> Which is interesting because I assumed that writing to GDTR took >> immediate effect. > >Trivia for your amusement: > >AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to >point to unmapped memory. Any attempt to use them (segment loads, >interrupts, IRET, etc) will try to access that memory as if the access >came from CPL 0 and, if the access fails, will generate a valid page >fault with CR2 pointing into the GDT or LDT. > >Xen is nuts^Wclever and actually uses this. > >Of course, if your #PF vector references a GDT or LDT descriptor and >trying to load that descriptor results in a page fault, you get a >double fault. > >I learned this while trying to puzzle out why v1 of my LDT >synchronization patch caused random faults on Xen. > >--Andy There is no "if"... you can't get to an interrupt vector without going through the GDT or LDT. That being said, the GDT or LDT can be partially mapped. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-10-15 12:18 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-10-14 11:30 [PATCH] x86: setup: extend low identity map to cover whole kernel range Paolo Bonzini 2015-10-14 13:52 ` Matt Fleming 2015-10-14 14:29 ` Paolo Bonzini 2015-10-14 21:04 ` Matt Fleming [not found] ` <20151014135211.GB2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 2015-10-14 16:22 ` Andy Lutomirski 2015-10-14 21:00 ` Matt Fleming [not found] ` <20151014210050.GE2782-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 2015-10-14 21:39 ` Andy Lutomirski 2015-10-15 9:45 ` Matt Fleming [not found] ` <CALCETrU=YL8yWpp29xO0N7TEVogX1j5Fyk5M_FpJTa9ZOS21Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-10-15 12:18 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).