xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Oleksii <oleksii.kurochko@gmail.com>
To: Julien Grall <julien@xen.org>, xen-devel@lists.xenproject.org
Cc: Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	 Stefano Stabellini <sstabellini@kernel.org>,
	Gianluca Guida <gianluca@rivosinc.com>,
	Bob Eshleman <bobbyeshleman@gmail.com>,
	Alistair Francis <alistair.francis@wdc.com>,
	 Connor Davis <connojdavis@gmail.com>
Subject: Re: [PATCH v2 1/3] xen/riscv: introduce setup_initial_pages
Date: Thu, 23 Mar 2023 13:18:12 +0200	[thread overview]
Message-ID: <c9f4b34c4aef31906e715c7ddce8e077e5eef52c.camel@gmail.com> (raw)
In-Reply-To: <60444252-80b6-230e-9090-2c96d5d6187d@xen.org>

Hi Julien,

On Wed, 2023-03-22 at 14:21 +0000, Julien Grall wrote:
...
> 
> > > > +    unsigned long page_addr;
> > > > +
> > > > +    // /* align start addresses */
> > > > +    // map_start &= XEN_PT_LEVEL_MAP_MASK(0);
> > > > +    // pa_start &= XEN_PT_LEVEL_MAP_MASK(0);
> > > 
> > > They should be switched to ASSERT() or BUG_ON().
> > Sure. Thanks. I'll update.
> > > 
> > > > +
> > > > +    page_addr = map_start;
> > > > +    while ( page_addr < map_end )
> > > 
> > > This loop is continue to assume that only the mapping can first
> > > in
> > > 2MB
> > > section (or less if the start is not 2MB aligned).
> > > 
> > > I am OK if you want to assume it, but I think this should be
> > > documented
> > > (with words and ASSERT()/BUG_ON()) to avoid any mistake.
> > I add a check in setup_initial_pagetables:
> >       BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) != 0);
> > Probably this is not a correct place and should be moved to
> > setup_initial_mapping() instead of setup_initial_pagetables()
> 
> Yes it should be moved in setup_initial_mapping().
Then I'll moved it to setup_initial_mapping()
> 
> > > 
> > > > +    {
> > > > +        unsigned long index2 = pt_index(2, page_addr);
> > > > +        unsigned long index1 = pt_index(1, page_addr);
> > > > +        unsigned long index0 = pt_index(0, page_addr);
> > > > +
> > > > +        /* Setup level2 table */
> > > > +        second[index2] = paddr_to_pte((unsigned long)first);
> > > > +        second[index2].pte |= PTE_TABLE;
> > > > +
> > > > +        /* Setup level1 table */
> > > > +        first[index1] = paddr_to_pte((unsigned long)zeroeth);
> > > > +        first[index1].pte |= PTE_TABLE;
> > > > +
> > > > +
> > > 
> > > NIT: Spurious line?
> > Yeah, should be removed. Thanks.
> > > 
> > > > +        /* Setup level0 table */
> > > > +        if ( !pte_is_valid(&zeroeth[index0]) )
> > > 
> > > On the previous version, you said it should be checked for each
> > > level.
> > I had a terrible internet connection, and my message wasn't sent.
> 
> No worries.
> 
> > 
> > I decided not to check that l2 and l1 are used only for referring
> > to
> > the next page table but not leaf PTE. So it is safe to overwrite it
> > each time (the addresses of page tables are the same all the time)
> 
> You are letting the caller to decide which page-table to use for each
> level. So you are at the mercy that caller will do the right thing.
> 
> IHMO, this is a pretty bad idea because debugging page-tables error
> are 
> difficult. So it is better to have safety in place. This is not
> worth...
> 
>   and
> > probably it will be better from optimization point of view to
> > ignore if
> > clauses.
> 
> ... the optimization in particular when this is at boot time.
I didn't think about that caller will do always the right thing so
I will add the check.
> 
> > 
> > And it is needed in case of L0 because it is possible that some
> > addressed were marked with specific flag ( execution, read, write )
> > and
> > so not to overwrite the flags set before the check is needed.
> In which case you should really report an error because the caller
> may 
> have decide to set execution flag and you don't honor. So when the
> code 
> is executed, you will receive a fault and this may be hard to find
> out 
> what happen.

Right now, it is expected situation that the caller will try to change
execution flag during the setup of initial page tables.

It is possible in the currently implemented logic of the setup of
initial page tables.

Let me explain what I mean.

The first step of setup_initial_pagetables() is to map .text, .init,
.rodata with necessary flags RX, RX, R.
Remaining sections will have RW flags, and to map them,
setup_initial_mapping() is called for the whole range of [linker_start,
linker_end] not to map them one by one thereby during this step
setup_initial_mapping() will try to remap addresses ranges which
overlap with .text, .init, .rodata with RW flags but it should leave
with the previously set flags.
> 
> > 
> > > the next page table but not leaf PTE.But here you still only
> > > check
> > > for a single level.
> > > 
> > > Furthermore, I would strongly suggest to also check the valid PTE
> > > is
> > > the
> > > same as you intend to write to catch any override (they are a
> > > pain to
> > > debug).
> > but if load addresses and linker addresses don't overlap is it
> > possible
> > situation that valid PTE will be overridden?
> 
> A bug in the code. In fact, if you add the check you would have
> notice 
> that your existing code is buggy (see below).
> 
> > > 
> > > > +        {
> > > > +            /* Update level0 table */
> > > > +            zeroeth[index0] = paddr_to_pte((page_addr -
> > > > map_start)
> > > > + pa_start);
> > > > +            zeroeth[index0].pte |= flags;
> > > > +        }
> > > > +
> > > > +        /* Point to next page */
> > > > +        page_addr += XEN_PT_LEVEL_SIZE(0);
> > > > +    }
> > > > +}
> > > > +
> > > > +/*
> > > > + * setup_initial_pagetables:
> > > > + *
> > > > + * Build the page tables for Xen that map the following:
> > > > + *   load addresses to linker addresses
> 
> I would suggest to expand because this is not entirely what you
> exactly 
> are doing. In fact...
> 
> > > > + */
> > > > +void __init setup_initial_pagetables(void)
> > > > +{
> > > > +    pte_t *second;
> > > > +    pte_t *first;
> > > > +    pte_t *zeroeth;
> > > > +
> > > > +    unsigned long load_addr_start   = boot_info.load_start;
> > > > +    unsigned long load_addr_end     = boot_info.load_end;
> > > > +    unsigned long linker_addr_start = boot_info.linker_start;
> > > > +    unsigned long linker_addr_end   = boot_info.linker_end;
> > > > +
> > > > +    BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) != 0);
> > > > +    if (load_addr_start != linker_addr_start)
> > > > +        BUG_ON((linker_addr_end > load_addr_start &&
> > > > load_addr_end
> > > > > linker_addr_start));
> > > 
> > > I would suggest to switch to a panic() with an error message as
> > > this
> > > would help the user understanding what this is breaking.
> > > 
> > > Alternatively, you could document what this check is for.
> > I think I will document it for now as panic() isn't ready for use
> > now.
> > > 
> > > > +
> > > > +    /* Get the addresses where the page tables were loaded */
> > > > +    second  = (pte_t *)(&xen_second_pagetable);
> > > > +    first   = (pte_t *)(&xen_first_pagetable);
> > > > +    zeroeth = (pte_t *)(&xen_zeroeth_pagetable);
> > > > +
> > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > +                          LOAD_TO_LINK((unsigned
> > > > long)&_stext),
> > > > +                          LOAD_TO_LINK((unsigned
> > > > long)&_etext),
> > > > +                          (unsigned long)&_stext,
> > > > +                          PTE_LEAF_DEFAULT);
> > > > +
> > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > +                          LOAD_TO_LINK((unsigned
> > > > long)&__init_begin),
> > > > +                          LOAD_TO_LINK((unsigned
> > > > long)&__init_end),
> > > > +                          (unsigned long)&__init_begin,
> > > > +                          PTE_LEAF_DEFAULT | PTE_WRITABLE);
> > > > +
> > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > +                          LOAD_TO_LINK((unsigned
> > > > long)&_srodata),
> > > > +                          LOAD_TO_LINK((unsigned
> > > > long)&_erodata),
> > > > +                          (unsigned long)(&_srodata),
> > > > +                          PTE_VALID | PTE_READABLE);
> > > > +
> > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > +                          linker_addr_start,
> > > > +                          linker_addr_end,
> > > > +                          load_addr_start,
> > > > +                          PTE_LEAF_DEFAULT | PTE_READABLE);
> 
> ... this is not cover above. AFAIU, this is the one for the 1:1
> mapping.
But there is no 1:1 mapping.
I thought that 1:1 mapping is when the physical address is equal to
0x10020 then after MMU is enabled the virtual address will be the same
0x10020 and mapped to 0x10020.

Probably I missed something but this code maps all linker addresses
to correspondent load addresses and it will be 1:1 only in case when
load address is equal to linker address.
> 
> > > 
> > > As I said in v1, you need to use a different set of page-table
> > > here.
> > If I understand you correctly I have to use a different set of
> > page-
> > table in case when it is possible that size of Xen will be larger
> > then
> > PAGE_SIZE. So I added to xen.lds.S a check to be sure that the size
> > fits into PAGE_SIZE.
> 
> This is not what I was referring to. I was pointing out that second, 
> first, zeroeth are exactly the same for all the callers. You want to 
> introduce a second set of zeroeth table. You will want to do the same
> for first but it is not always used.
> 
> Otherwise, this is not going to work if Xen is loaded at a different 
> address than the runtime.
Ok.

I understand what do you mean in general but still I am not sure that I
understand a particular case when I am sure that the size of Xen is no
bigger then 2MB and load address is aligned on 2Mb boundary.

The size of one l0 page table is 512 (1 << 9 ) entries which covers 4K
(1 << 12) * 512 = 2 Mb of memory so it should be fine.
All the callers in my case are trying to map addresses from
[linker_start, linker_end] which is not bigger then 2 MB and is aligned
on 2 MB boundary.

> 
> That said, when I spoke with Andrew yesterday, he mentioned that your
> initial goal is to support the case where Xen is loaded at the
> runtime 
> address. I understand this simplifies a lot the code and I told him I
> was OK with that. However, it would be good to document what are your
> goals in each series (this is not always clear what you skip on
> purpose).
> 
> > 
> > > Also, where do you guarantee that Xen will be loaded at a 2MB
> > > aligned
> > > address? (For a fact I know that UEFI is only guarantee 4KB
> > > alignment).
> > There is a check in setup_initial_pagetables:
> >       BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) != 0);
> 
> This is not very obvious the check is to confirm that Xen is probably
> aligned. I would suggest to add a comment.
> 
> Also, you might want to use XEN_PT_LEVEL_SIZE(..) to make it more 
> obvious what sort of alignment you are trying to enforce.
I will update add the comment and use XEN_PT_LEVEL_SIZE(...) instead.
> 
> > > > +        la      sp, cpu0_boot_stack
> > > > +        li      t0, STACK_SIZE
> > > > +        add     sp, sp, t0
> > > > +
> > > > +        /*
> > > > +         * Re-init an address of exception handler as it was
> > > > overwritten  with
> > > > +         * the address of the .L_mmu_is_enabled label.
> > > > +         * Before jump to trap_init save return address of
> > > > enable_mmu() to
> > > > +         * know where we should back after enable_mmu() will
> > > > be
> > > > finished.
> > > > +         */
> > > > +        mv      s0, ra
> > > > +        lla     t0, trap_init
> > > > +        jalr    ra, t0
> > > > +
> > > > +        /*
> > > > +         * Re-calculate the return address of enable_mmu()
> > > > function for case
> > > > +         * when linker start address isn't equal to load start
> > > > address
> > > > +         */
> > > > +        add     s0, s0, t1
> > > > +        mv      ra, s0
> > > > +
> > > > +        ret
> > > 
> > > Missing ENDPROC?
> > I haven't seen the usage of ENDPROC for RISC-V so it looks like it
> > is
> > not necessary.
> 
> Ok. Would the objdump be able to report the function properly then? I
> know that on Arm, it was necessary report assembly function properly.
It is fine for RISC-V:

Disassembly of section .text:

0000000000200000 <_start>:
  200000:       10401073                csrw    sie,zero
  200004:       6299                    lui     t0,0x6
...
  20003c:       00000013                nop

0000000000200040 <enable_mmu>:
  200040:       0003a297                auipc   t0,0x3a
...
  20009e:       941a                    add     s0,s0,t1
  2000a0:       80a2                    mv      ra,s0
  2000a2:       8082                    ret
        ...

00000000002000b0 <__bitmap_empty>:
  2000b0:       1141                    addi    sp,sp,-16
  2000b2:       e422                    sd      s0,8(sp)
  2000b4:       0800                    addi    s0,sp,16
...

> 
> > > 
> > > > diff --git a/xen/arch/riscv/xen.lds.S
> > > > b/xen/arch/riscv/xen.lds.S
> > > > index eed457c492..e4ac4e84b6 100644
> > > > --- a/xen/arch/riscv/xen.lds.S
> > > > +++ b/xen/arch/riscv/xen.lds.S
> > > > @@ -179,3 +179,5 @@ SECTIONS
> > > >    
> > > >    ASSERT(!SIZEOF(.got),      ".got non-empty")
> > > >    ASSERT(!SIZEOF(.got.plt),  ".got.plt non-empty")
> > > > +
> > > > +ASSERT(_end - _start <= MB(2), "Xen too large for early-boot
> > > > assumptions")
> > > 
> > ~ Oleksii
> > 
> 
~ Oleksii



  reply	other threads:[~2023-03-23 11:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16 16:43 [PATCH v2 0/3] enable MMU for RISC-V Oleksii Kurochko
2023-03-16 16:43 ` [PATCH v2 1/3] xen/riscv: introduce setup_initial_pages Oleksii Kurochko
2023-03-20 16:41   ` Jan Beulich
2023-03-21 17:08     ` Oleksii
2023-03-22  8:12       ` Jan Beulich
2023-03-22  9:22         ` Oleksii
2023-03-21 17:58   ` Julien Grall
2023-03-22  8:19     ` Jan Beulich
2023-03-22 13:55       ` Julien Grall
2023-03-22  9:55     ` Oleksii
2023-03-22 14:21       ` Julien Grall
2023-03-23 11:18         ` Oleksii [this message]
2023-03-23 11:57           ` Julien Grall
2023-03-23 12:30             ` Oleksii
2023-03-23 12:44               ` Julien Grall
2023-03-27 10:50     ` Oleksii
2023-03-27 10:54       ` Julien Grall
2023-03-16 16:43 ` [PATCH v2 2/3] xen/riscv: setup initial pagetables Oleksii Kurochko
2023-03-20 17:03   ` Jan Beulich
2023-03-21 17:09     ` Oleksii
2023-03-16 16:43 ` [PATCH v2 3/3] xen/riscv: remove dummy_bss variable Oleksii Kurochko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9f4b34c4aef31906e715c7ddce8e077e5eef52c.camel@gmail.com \
    --to=oleksii.kurochko@gmail.com \
    --cc=alistair.francis@wdc.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bobbyeshleman@gmail.com \
    --cc=connojdavis@gmail.com \
    --cc=gianluca@rivosinc.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).