All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Juergen Gross <JGross@suse.com>,
	sstabellini@kernel.org, andrew.cooper3@citrix.com,
	cardoe@cardoe.com, pgnet.dev@gmail.com, ning.sun@intel.com,
	david.vrabel@citrix.com, xen-devel@lists.xenproject.org,
	qiaowei.ren@intel.com, gang.wei@intel.com, fu.wei@linaro.org
Subject: Re: [PATCH v5 15/16] x86: make Xen early boot code relocatable
Date: Thu, 25 Aug 2016 08:41:39 -0600	[thread overview]
Message-ID: <57BF1FC3020000780010915B@prv-mh.provo.novell.com> (raw)
In-Reply-To: <1471646606-28519-16-git-send-email-daniel.kiper@oracle.com>

>>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
> Every multiboot protocol (regardless of version) compatible image must
> specify its load address (in ELF or multiboot header). Multiboot protocol
> compatible loader have to load image at specified address. However, there
> is no guarantee that the requested memory region (in case of Xen it starts
> at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
> and it is free (legacy BIOS platforms are merciful for Xen but I found at
> least one EFI platform on which Xen load address conflicts with EFI boot
> services; it is Dell PowerEdge R820 with latest firmware). To cope with that
> problem we must make Xen early boot code relocatable and help boot loader to
> relocate image in proper way by suggesting, not requesting specific load
> addresses as it is right now, allowed address ranges. This patch does former.
> It does not add multiboot2 protocol interface which is done in "x86: add
> multiboot2 protocol support for relocatable images" patch.
> 
> This patch changes following things:
>   - default load address is changed from 1 MiB to 2 MiB; I did that because
>     initial page tables are using 2 MiB huge pages and this way required
>     updates for them are quite easy; it means that e.g. we avoid spacial
>     cases for start and end of required memory region if it live at address
>     not aligned to 2 MiB,

Should this be a separate change then, to separate possible
regressions due to that from such due to any other of the changes
here? And then I don't see why, when making the image relocatable
anyway, the link time load address actually still matters.

>   - %esi and %r15d registers are used as a storage for Xen image load base
>     address (%r15d shortly because %rsi is used for EFI SystemTable address
>     in 64-bit code); both registers are (%esi is mostly) unused in early
>     boot code and preserved during C functions calls,

In a context where you (also) talk about 64-bit code, %esi can't
be called preserved unilaterally. You should make clear that this is
for 32-bit function calls.

> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -12,13 +12,16 @@
>          .text
>          .code32
>  
> -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
> +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)

In a patch already large and hence hard to review, did you really
need to do this rename?

> @@ -126,26 +130,26 @@ vga_text_buffer:
>          .section .init.text, "ax", @progbits
>  
>  bad_cpu:
> -        mov     $(sym_phys(.Lbad_cpu_msg)),%esi # Error message
> +        lea     esi_offset(.Lbad_cpu_msg),%esi  # Error message

Why not just

        add     $sym_offset(.Lbad_cpu_msg),%esi  # Error message

? Together with not doing said rename, this would be more obviously
correct.

> @@ -409,36 +436,93 @@ trampoline_bios_setup:
>          cmovb   %edx,%ecx           /* and use the smaller */
>  
>  trampoline_setup:
> +        /*
> +         * Called on legacy BIOS and EFI platforms.
> +         *
> +         * Compute 0-15 bits of BOOT_FS segment descriptor base address.
> +         */
> +        mov     %esi,%edx
> +        shl     $16,%edx
> +        or      %edx,BOOT_FS+esi_offset(trampoline_gdt)

	mov   %si,BOOT_FS+esi_offset(trampoline_gdt)

> +        /* Compute 16-23 bits of BOOT_FS segment descriptor base address. */
> +        mov     %esi,%edx
> +        shr     $16,%edx
> +        and     $0x000000ff,%edx
> +        or      %edx,BOOT_FS+4+esi_offset(trampoline_gdt)

	mov   %dl,BOOT_FS+4+esi_offset(trampoline_gdt)

> +        /* Compute 24-31 bits of BOOT_FS segment descriptor base address. */
> +        mov     %esi,%edx
> +        and     $0xff000000,%edx
> +        or      %edx,BOOT_FS+4+esi_offset(trampoline_gdt)

	mov   %dh,BOOT_FS+7+esi_offset(trampoline_gdt)

(with various of the intermediate instructions dropped)

That'll reduce you code size concern for the GDT setup quite a bit.

> +        /*
> +         * Initialise %fs and later use it to access Xen data if possible.
> +         * According to Intel 64 and IA-32 Architectures Software Developer’s
> +         * Manual it is safe to do that without reloading GDTR before.
> +         *
> +         * Please check Intel 64 and IA-32 Architectures Software Developer’s
> +         * Manual, Volume 2 (2A, 2B & 2C): Instruction Set Reference,
> +         * LGDT and MOV instructions description and
> +         * Intel 64 and IA-32 Architectures Software Developer’s
> +         * Manual Volume 3 (3A, 3B & 3C): System Programming Guide,
> +         * section 3.4.3, Segment Registers for more details.
> +         *
> +         * AIUI, only GDT address and limit are loaded into GDTR when
> +         * lgdt is executed. Segment descriptor is loaded directly from
> +         * memory into segment register (hiden part) only when relevant
> +         * load instruction is used (e.g. mov %edx,%fs). Though GDT content
> +         * probably could be stored in CPU cache but nothing suggest that
> +         * CPU caching interfere in one way or another with segment descriptor
> +         * load. So, it looks that every change in active GDT is immediately
> +         * available for relevant segment descriptor load instruction.
> +         *
> +         * I was not able to find anything which invalidates above.
> +         * So, everything suggest that we do not need an extra lgdt here.
> +         */

All of this comment except for the first sentence is just stating basic
architecture facts. Please drop all that.

> +        mov     $BOOT_FS,%edx
> +        mov     %edx,%fs
> +
>          /* Reserve 64kb for the trampoline. */
>          sub     $0x1000,%ecx
>  
>          /* From arch/x86/smpboot.c: start_eip had better be page-aligned! */
>          xor     %cl, %cl
>          shl     $4, %ecx
> -        mov     %ecx,sym_phys(trampoline_phys)
> +        mov     %ecx,fs_offset(trampoline_phys)

Seeing the first instance, together with the earlier comment about
not renaming sym_phys() I think this would end up more consistently
with using sym_fs() (and then obviously sym_esi()) as names.

> @@ -461,62 +545,88 @@ trampoline_setup:
>  
>          /* Stash TSC to calculate a good approximation of time-since-boot */
>          rdtsc
> -        mov     %eax,sym_phys(boot_tsc_stamp)
> -        mov     %edx,sym_phys(boot_tsc_stamp+4)
> +        mov     %eax,fs_offset(boot_tsc_stamp)
> +        mov     %edx,fs_offset(boot_tsc_stamp)+4
> +
> +        /* Update frame addresses in page tables. */
> +        mov     $((__page_tables_end-__page_tables_start)/8),%ecx
> +1:      testl   $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8)
> +        jz      2f
> +        add     %esi,fs_offset(__page_tables_start)-8(,%ecx,8)
> +2:      loop    1b

This loop includes the mapping of the low Mb, which I don't think it
should modify. Or wait - you move __page_tables_start, which by
itself is fragile (but looks to be acceptable if accompanied by a
comment explaining why it doesn't cover l1_identmap).

Also, just to double check - all these page table setup adjustments
don't require reflecting in xen.efi's page table setup code?

>          /* Apply relocations to bootstrap trampoline. */
> -        mov     sym_phys(trampoline_phys),%edx
> -        mov     $sym_phys(__trampoline_rel_start),%edi
> +        mov     fs_offset(trampoline_phys),%edx
> +        mov     $sym_offset(__trampoline_rel_start),%edi
> +        mov     $sym_offset(__trampoline_rel_stop),%ebx
>  1:
> -        mov     (%edi),%eax
> -        add     %edx,(%edi,%eax)
> +        mov     %fs:(%edi),%eax
> +        add     %edx,%fs:(%edi,%eax)
>          add     $4,%edi
> -        cmp     $sym_phys(__trampoline_rel_stop),%edi
> +        cmp     %ebx,%edi
>          jb      1b

And again it looks like the switch to using %ebx here is not needed.

>          /* Patch in the trampoline segment. */
>          shr     $4,%edx
> -        mov     $sym_phys(__trampoline_seg_start),%edi
> +        mov     $sym_offset(__trampoline_seg_start),%edi
> +        mov     $sym_offset(__trampoline_seg_stop),%ebx
>  1:
> -        mov     (%edi),%eax
> -        mov     %dx,(%edi,%eax)
> +        mov     %fs:(%edi),%eax
> +        mov     %dx,%fs:(%edi,%eax)
>          add     $4,%edi
> -        cmp     $sym_phys(__trampoline_seg_stop),%edi
> +        cmp     %ebx,%edi
>          jb      1b

Same here then, obviously.

> --- a/xen/arch/x86/boot/trampoline.S
> +++ b/xen/arch/x86/boot/trampoline.S
> @@ -54,12 +54,20 @@ trampoline_gdt:
>          /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
>          .long   0x0000ffff
>          .long   0x00009200
> +        /*
> +         * 0x0030: ring 0 Xen data, 16 MiB size, base
> +         * address is computed during runtime.
> +         */
> +        .quad   0x00c0920000001000

This does not look like it covers 1Mb, it's more like 1Mb+4k-1.

>          .pushsection .trampoline_rel, "a"
>          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
>          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
>          .popsection
>  
> +GLOBAL(xen_img_load_base_addr)
> +        .long   0

I've yet to understand the purpose of this symbol, but in any case:
If the trampoline code doesn't reference it, why does it get placed
here?

> @@ -861,15 +860,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1);
>  #endif
>  
> +    /* Do not relocate Xen image if boot loader did work for us. */
> +    if ( xen_img_load_base_addr )
> +        xen_phys_start = xen_img_load_base_addr;

Okay, with this change the question really is: Why do you need the
new symbol? I.e. why can't you just use xen_phys_start, just like I
managed to re-use it when I introduced boot from EFI?

>      for ( i = boot_e820.nr_map-1; i >= 0; i-- )
>      {
>          uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
>          uint64_t end, limit = ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT;
>  
> -        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */

I can see that you want to get rid of BOOTSTRAP_MAP_BASE, but
please don't delete the comment as a whole.

>          s = (boot_e820.map[i].addr + mask) & ~mask;
>          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
>              continue;

This means you now map memory between 2Mb and 16Mb here. Is
this necessary?

> @@ -901,7 +902,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>              l4_pgentry_t *pl4e;
>              l3_pgentry_t *pl3e;
>              l2_pgentry_t *pl2e;
> -            uint64_t load_start;
>              int i, j, k;
>  
>              /* Select relocation address. */
> @@ -915,9 +915,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>               * with a barrier(). After this we must *not* modify static/global
>               * data until after we have switched to the relocated pagetables!
>               */
> -            load_start = (unsigned long)_start - XEN_VIRT_START;
>              barrier();
> -            move_memory(e + load_start, load_start, _end - _start, 1);
> +            move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, 1);

Assuming _start - XEN_VIRT_START == XEN_IMG_OFFSET, is this
change necessary? Or is it rather just simplification, which again
should be split from an already complex patch.

> @@ -957,15 +956,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>               * Undo the temporary-hooking of the l1_identmap.  __2M_text_start
>               * is contained in this PTE.
>               */
> -            BUG_ON(l2_table_offset((unsigned long)_erodata) ==
> -                   l2_table_offset((unsigned long)_stext));

At least when using_2M_mapping() this surely ought to hold?

> @@ -1019,6 +1017,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>                  : "memory" );
>  
>              bootstrap_map(NULL);
> +
> +            printk("New Xen image base address: 0x%08lx\n", xen_phys_start);

I don't see the motivation for adding such a message in this patch,
but if, then please use %#lx.

> @@ -1082,6 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      if ( !xen_phys_start )
>          panic("Not enough memory to relocate Xen.");
> +
>      reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end));

Another stray change.

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -55,7 +55,7 @@ SECTIONS
>    __2M_text_start = .;         /* Start of 2M superpages, mapped RX. */
>  #endif
>  
> -  . = __XEN_VIRT_START + MB(1);
> +  . = __XEN_VIRT_START + XEN_IMG_OFFSET;
>    _start = .;
>    .text : {
>          _stext = .;            /* Text and read-only data */
> @@ -257,12 +257,14 @@ SECTIONS
>    .reloc : {
>      *(.reloc)
>    } :text
> -  /* Trick the linker into setting the image size to exactly 16Mb. */
>    . = ALIGN(__section_alignment__);
> +#endif
> +
> +  /* Trick the linker into setting the image size to exactly 16Mb. */
>    .pad : {
>      . = ALIGN(MB(16));
> +    __end_of_image__ = .;

I see that you use this symbol in xen/arch/x86/Makefile, but I again
don't follow why this logic needs to change.

> --- a/xen/include/asm-x86/page.h
> +++ b/xen/include/asm-x86/page.h
> @@ -288,7 +288,7 @@ extern root_pgentry_t 
> idle_pg_table[ROOT_PAGETABLE_ENTRIES];
>  extern l2_pgentry_t  *compat_idle_pg_table_l2;
>  extern unsigned int   m2p_compat_vstart;
>  extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
> -    l2_bootmap[L2_PAGETABLE_ENTRIES];
> +    l2_bootmap[4*L2_PAGETABLE_ENTRIES];

Why do you need 4 of them? I can see why one might not suffice,
but two surely should?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-08-25 14:41 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-19 22:43 [PATCH v5 00/16] x86: multiboot2 protocol support Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 01/16] x86: allow EFI reboot method neither on EFI platforms Daniel Kiper
2016-08-25 11:19   ` Jan Beulich
2016-08-19 22:43 ` [PATCH v5 02/16] x86/boot: remove multiboot1_header_end from symbol table Daniel Kiper
2016-08-25 11:21   ` Jan Beulich
2016-08-30 14:27     ` Daniel Kiper
2016-08-30 15:11       ` Jan Beulich
2016-08-19 22:43 ` [PATCH v5 03/16] x86/boot: create *.lnk files with linker script Daniel Kiper
2016-08-25 11:28   ` Jan Beulich
2016-08-19 22:43 ` [PATCH v5 04/16] x86/boot/reloc: reduce assembly usage as much as possible Daniel Kiper
2016-08-25 11:29   ` Jan Beulich
2016-08-19 22:43 ` [PATCH v5 05/16] x86/boot: call reloc() using stdcall calling convention Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 06/16] x86/boot/reloc: create generic alloc and copy functions Daniel Kiper
2016-08-25 11:34   ` Jan Beulich
2016-08-30 14:32     ` Daniel Kiper
2016-08-30 15:12       ` Jan Beulich
2016-08-31 15:13         ` Daniel Kiper
2016-08-31 15:25           ` Jan Beulich
2016-08-31 19:39             ` Daniel Kiper
2016-09-01  7:35               ` Jan Beulich
2016-09-06 15:33       ` Doug Goldstein
2016-08-19 22:43 ` [PATCH v5 07/16] x86/boot: use %ecx instead of %eax Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 08/16] x86/boot/reloc: rename some variables and rearrange code a bit Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 09/16] x86: add multiboot2 protocol support Daniel Kiper
2016-08-25 11:50   ` Jan Beulich
2016-08-30 14:41     ` Daniel Kiper
2016-08-30 15:14       ` Jan Beulich
2016-08-31 15:21         ` Daniel Kiper
2016-08-31 20:18   ` Andrew Cooper
2016-08-31 21:01     ` Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 10/16] efi: create efi_enabled() Daniel Kiper
2016-08-25 12:16   ` Jan Beulich
2016-08-30 17:19     ` Daniel Kiper
2016-08-31 12:31       ` Jan Beulich
2016-08-19 22:43 ` [PATCH v5 11/16] efi: build xen.gz with EFI code Daniel Kiper
2016-08-25 12:23   ` Jan Beulich
2016-08-19 22:43 ` [PATCH v5 12/16] x86/efi: create new early memory allocator Daniel Kiper
2016-09-05 12:33   ` Jan Beulich
2016-09-07 12:05     ` Daniel Kiper
2016-09-07 14:01       ` Jan Beulich
2016-09-08  8:29         ` Daniel Kiper
2016-09-08  9:59           ` Jan Beulich
2016-08-19 22:43 ` [PATCH v5 13/16] x86: add multiboot2 protocol support for EFI platforms Daniel Kiper
2016-08-25 12:59   ` Jan Beulich
2016-08-30 19:32     ` Daniel Kiper
2016-08-31 12:49       ` Jan Beulich
2016-08-31 17:07         ` Daniel Kiper
2016-09-01  7:40           ` Jan Beulich
2016-09-01 20:37             ` Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 14/16] x86/boot: implement early command line parser in C Daniel Kiper
2016-08-25 13:27   ` Jan Beulich
2016-08-30 19:58     ` Daniel Kiper
2016-08-31 13:01       ` Jan Beulich
2016-08-31 19:31         ` Daniel Kiper
2016-09-01  7:41           ` Jan Beulich
2016-09-01 20:43             ` Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 15/16] x86: make Xen early boot code relocatable Daniel Kiper
2016-08-25 14:41   ` Jan Beulich [this message]
2016-08-31 14:59     ` Daniel Kiper
2016-08-31 15:46       ` Jan Beulich
2016-08-31 20:50         ` Daniel Kiper
2016-09-01  7:46           ` Jan Beulich
2016-09-01 21:19             ` Daniel Kiper
2016-09-02  6:58               ` Jan Beulich
2016-09-02  7:28                 ` Daniel Kiper
2016-08-19 22:43 ` [PATCH v5 16/16] x86: add multiboot2 protocol support for relocatable images Daniel Kiper
2016-08-22 10:10 ` [PATCH v5 00/16] x86: multiboot2 protocol support Jan Beulich
2016-08-30 14:15   ` Daniel Kiper
2016-08-30 15:09     ` Jan Beulich
2016-08-31 15:05       ` Daniel Kiper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57BF1FC3020000780010915B@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=cardoe@cardoe.com \
    --cc=daniel.kiper@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=fu.wei@linaro.org \
    --cc=gang.wei@intel.com \
    --cc=ning.sun@intel.com \
    --cc=pgnet.dev@gmail.com \
    --cc=qiaowei.ren@intel.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.