xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien@xen.org>
To: Oleksii <oleksii.kurochko@gmail.com>, xen-devel@lists.xenproject.org
Cc: Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Gianluca Guida <gianluca@rivosinc.com>,
	Bob Eshleman <bobbyeshleman@gmail.com>,
	Alistair Francis <alistair.francis@wdc.com>,
	Connor Davis <connojdavis@gmail.com>
Subject: Re: [PATCH v2 1/3] xen/riscv: introduce setup_initial_pages
Date: Thu, 23 Mar 2023 12:44:07 +0000	[thread overview]
Message-ID: <79e83610-5980-d9b5-7994-6b0cb2b9049a@xen.org> (raw)
In-Reply-To: <1da599963f20f84c84a9114e10776da3ed0d35e2.camel@gmail.com>

Hi Oleksii,

On 23/03/2023 12:30, Oleksii wrote:
> On Thu, 2023-03-23 at 11:57 +0000, Julien Grall wrote:
>> On 23/03/2023 11:18, Oleksii wrote:
>>> Hi Julien,
>>
>> Hi Oleksii,
>>
>>> On Wed, 2023-03-22 at 14:21 +0000, Julien Grall wrote:
>>> ...
>>>>
>>>>>>> +    unsigned long page_addr;
>>>>>>> +
>>>>>>> +    // /* align start addresses */
>>>>>>> +    // map_start &= XEN_PT_LEVEL_MAP_MASK(0);
>>>>>>> +    // pa_start &= XEN_PT_LEVEL_MAP_MASK(0);
>>>>>>
>>>>>> They should be switched to ASSERT() or BUG_ON().
>>>>> Sure. Thanks. I'll update.
>>>>>>
>>>>>>> +
>>>>>>> +    page_addr = map_start;
>>>>>>> +    while ( page_addr < map_end )
>>>>>>
>>>>>> This loop is continue to assume that only the mapping can
>>>>>> first
>>>>>> in
>>>>>> 2MB
>>>>>> section (or less if the start is not 2MB aligned).
>>>>>>
>>>>>> I am OK if you want to assume it, but I think this should be
>>>>>> documented
>>>>>> (with words and ASSERT()/BUG_ON()) to avoid any mistake.
>>>>> I add a check in setup_initial_pagetables:
>>>>>         BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) !=
>>>>> 0);
>>>>> Probably this is not a correct place and should be moved to
>>>>> setup_initial_mapping() instead of setup_initial_pagetables()
>>>>
>>>> Yes it should be moved in setup_initial_mapping().
>>> Then I'll moved it to setup_initial_mapping()
>>>>
>>>>>>
>>>>>>> +    {
>>>>>>> +        unsigned long index2 = pt_index(2, page_addr);
>>>>>>> +        unsigned long index1 = pt_index(1, page_addr);
>>>>>>> +        unsigned long index0 = pt_index(0, page_addr);
>>>>>>> +
>>>>>>> +        /* Setup level2 table */
>>>>>>> +        second[index2] = paddr_to_pte((unsigned
>>>>>>> long)first);
>>>>>>> +        second[index2].pte |= PTE_TABLE;
>>>>>>> +
>>>>>>> +        /* Setup level1 table */
>>>>>>> +        first[index1] = paddr_to_pte((unsigned
>>>>>>> long)zeroeth);
>>>>>>> +        first[index1].pte |= PTE_TABLE;
>>>>>>> +
>>>>>>> +
>>>>>>
>>>>>> NIT: Spurious line?
>>>>> Yeah, should be removed. Thanks.
>>>>>>
>>>>>>> +        /* Setup level0 table */
>>>>>>> +        if ( !pte_is_valid(&zeroeth[index0]) )
>>>>>>
>>>>>> On the previous version, you said it should be checked for
>>>>>> each
>>>>>> level.
>>>>> I had a terrible internet connection, and my message wasn't
>>>>> sent.
>>>>
>>>> No worries.
>>>>
>>>>>
>>>>> I decided not to check that l2 and l1 are used only for
>>>>> referring
>>>>> to
>>>>> the next page table but not leaf PTE. So it is safe to
>>>>> overwrite it
>>>>> each time (the addresses of page tables are the same all the
>>>>> time)
>>>>
>>>> You are letting the caller to decide which page-table to use for
>>>> each
>>>> level. So you are at the mercy that caller will do the right
>>>> thing.
>>>>
>>>> IHMO, this is a pretty bad idea because debugging page-tables
>>>> error
>>>> are
>>>> difficult. So it is better to have safety in place. This is not
>>>> worth...
>>>>
>>>>     and
>>>>> probably it will be better from optimization point of view to
>>>>> ignore if
>>>>> clauses.
>>>>
>>>> ... the optimization in particular when this is at boot time.
>>> I didn't think about that caller will do always the right thing so
>>> I will add the check.
>>>>
>>>>>
>>>>> And it is needed in case of L0 because it is possible that some
>>>>> addressed were marked with specific flag ( execution, read,
>>>>> write )
>>>>> and
>>>>> so not to overwrite the flags set before the check is needed.
>>>> In which case you should really report an error because the
>>>> caller
>>>> may
>>>> have decide to set execution flag and you don't honor. So when
>>>> the
>>>> code
>>>> is executed, you will receive a fault and this may be hard to
>>>> find
>>>> out
>>>> what happen.
>>>
>>> Right now, it is expected situation that the caller will try to
>>> change
>>> execution flag during the setup of initial page tables. >
>>> It is possible in the currently implemented logic of the setup of
>>> initial page tables.
>>
>> This sounds like a bug in your caller implementation. You should not
>> try
>> to workaround this in your code updating the page-tables.
>>
>>> Let me explain what I mean.
>>>
>>> The first step of setup_initial_pagetables() is to map .text,
>>> .init,
>>> .rodata with necessary flags RX, RX, R.
>>> Remaining sections will have RW flags, and to map them,
>>> setup_initial_mapping() is called for the whole range of
>>> [linker_start,
>>> linker_end] not to map them one by one thereby during this step
>>> setup_initial_mapping() will try to remap addresses ranges which
>>> overlap with .text, .init, .rodata with RW flags but it should
>>> leave
>>> with the previously set flags.
>> Why do you need to call setup_init_mapping() with the whole range? In
>> fact the only reason I can think this is useful is when you when to
>> create a 1:1 mapping when the linker and load address is different.
> It is needed to not map each section separately one by one as most of
> the sections have the same PTE_FLAGS (Read, Write, eXectuable, etc )
> 
> So it was decided to map the following sections separately as they have
> 'unique' flags:
>   - .text -> RX
>   - .rodata -> R
>   - .init.text -> RX
> 
> All other sections are RW and could be covered by one call of
> setup_init_mapping() for the whole range:
>   - .data.ro_after_init
>   - .data.read_mostly
>   - .data
>   - .init.data
>   - .bss
> So some ranges ( .text, .rodata, .init.text ) from the whole range will
> be skipped as already mapped and the rest sections will be mapped
> during one call of setup_init_mapping().

This approach is very fragile and not scalable because:
  * You can't use setup_initial_mapping() to change the permission flags.
  * You can't catch caller mistakes (e.g. imagine you end up to use a 
different physical region).

I can see two solutions:

   1) Loop through the region page and page and check within permission 
you want (see the loop in setup_pagetables() for Arm).
   2) Re-order the calls so you want first all Xen and then update the 
permission flags as it fits.

I don't have a strong preference between the two options here.

Cheers,

-- 
Julien Grall


  reply	other threads:[~2023-03-23 12:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16 16:43 [PATCH v2 0/3] enable MMU for RISC-V Oleksii Kurochko
2023-03-16 16:43 ` [PATCH v2 1/3] xen/riscv: introduce setup_initial_pages Oleksii Kurochko
2023-03-20 16:41   ` Jan Beulich
2023-03-21 17:08     ` Oleksii
2023-03-22  8:12       ` Jan Beulich
2023-03-22  9:22         ` Oleksii
2023-03-21 17:58   ` Julien Grall
2023-03-22  8:19     ` Jan Beulich
2023-03-22 13:55       ` Julien Grall
2023-03-22  9:55     ` Oleksii
2023-03-22 14:21       ` Julien Grall
2023-03-23 11:18         ` Oleksii
2023-03-23 11:57           ` Julien Grall
2023-03-23 12:30             ` Oleksii
2023-03-23 12:44               ` Julien Grall [this message]
2023-03-27 10:50     ` Oleksii
2023-03-27 10:54       ` Julien Grall
2023-03-16 16:43 ` [PATCH v2 2/3] xen/riscv: setup initial pagetables Oleksii Kurochko
2023-03-20 17:03   ` Jan Beulich
2023-03-21 17:09     ` Oleksii
2023-03-16 16:43 ` [PATCH v2 3/3] xen/riscv: remove dummy_bss variable Oleksii Kurochko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79e83610-5980-d9b5-7994-6b0cb2b9049a@xen.org \
    --to=julien@xen.org \
    --cc=alistair.francis@wdc.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bobbyeshleman@gmail.com \
    --cc=connojdavis@gmail.com \
    --cc=gianluca@rivosinc.com \
    --cc=jbeulich@suse.com \
    --cc=oleksii.kurochko@gmail.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).