xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien@xen.org>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>,
	andrew.cooper3@citrix.com,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Jan Beulich <jbeulich@suse.com>,
	xen-devel@lists.xenproject.org,
	Stefano Stabellini <stefano.stabellini@xilinx.com>,
	Volodymyr_Babchuk@epam.com, "Woodhouse,
	David" <dwmw@amazon.co.uk>
Subject: Re: [PATCH 05/12] xen: introduce reserve_heap_pages
Date: Thu, 30 Apr 2020 19:27:55 +0100	[thread overview]
Message-ID: <86e8fa89-c6f5-6c9e-4f3e-7f98e8e12c6a@xen.org> (raw)
In-Reply-To: <alpine.DEB.2.21.2004300928240.28941@sstabellini-ThinkPad-T480s>

Hi,

On 30/04/2020 18:00, Stefano Stabellini wrote:
> On Thu, 30 Apr 2020, Julien Grall wrote:
>>>>> +    pg = maddr_to_page(start);
>>>>> +    node = phys_to_nid(start);
>>>>> +    zone = page_to_zone(pg);
>>>>> +    page_list_del(pg, &heap(node, zone, order));
>>>>> +
>>>>> +    __alloc_heap_pages(pg, order, memflags, d);
>>>>
>>>> I agree with Julien in not seeing how this can be safe / correct.
>>>
>>> I haven't seen any issues so far in my testing -- I imagine it is
>>> because there aren't many memory allocations after setup_mm() and before
>>> create_domUs()  (which on ARM is called just before
>>> domain_unpause_by_systemcontroller at the end of start_xen.)
>>
>> I am not sure why you exclude setup_mm(). Any memory allocated (boot
>> allocator, xenheap) can clash with your regions. The main memory allocations
>> are for the frametable and dom0. I would say you were lucky to not hit them.
> 
> Maybe it is because Xen typically allocates memory top-down? So if I
> chose a high range then I would see a failure? But I have been mostly
> testing with ranges close to the begin of RAM (as opposed to
> ranges close to the end of RAM.)

I haven't looked at the details of the implementation, but you can try 
to specify dom0 addresses for your domU. You should see a failure.

> 
>   
>>> I gave a quick look at David's series. Is the idea that I should add a
>>> patch to do the following:
>>>
>>> - avoiding adding these ranges to xenheap in setup_mm, wait for later
>>>     (a bit like reserved_mem regions)
>>
>> I guess by xenheap, you mean domheap? But the problem is not only for domheap,
>> it is also for any memory allocated via the boot allocator. So you need to
>> exclude those regions from any possible allocations.
> 
> OK, I think we are saying the same thing but let me check.
> 
> By boot allocator you mean alloc_boot_pages, right? That boot allocator
> operates on ranges given to it by init_boot_pages calls.

That's correct.

> init_boot_pages is called from setup_mm. I didn't write it clearly but
> I also meant not calling init_boot_pages on them from setup_mm.
> 
> Are we saying the same thing?

Yes.

> 
> 
>>> - in construct_domU, add the range to xenheap and reserve it with
>>> reserve_heap_pages
>>
>> I am afraid you can't give the regions to the allocator and then allocate
>> them. The allocator is free to use any page for its own purpose or exclude
>> them.
>>
>> AFAICT, the allocator doesn't have a list of page in use. It only keeps track
>> of free pages. So we can make the content of struct page_info to look like it
>> was allocated by the allocator.
>>
>> We would need to be careful when giving a page back to allocator as the page
>> would need to be initialized (see [1]). This may not be a concern for Dom0less
>> as the domain may never be destroyed but will be for correctness PoV.
>>
>> For LiveUpdate, the original Xen will carve out space to use by the boot
>> allocator in the new Xen. But I think this is not necessary in your context.
>>
>> It should be sufficient to exclude the page from the boot allocators (as we do
>> for other modules).
>>
>> One potential issue that can arise is there is no easy way today to
>> differentiate between pages allocated and pages not yet initialized. To make
>> the code robust, we need to prevent a page to be used in two places. So for
>> LiveUpdate we are marking them with a special value, this is used afterwards
>> to check we are effictively using a reserved page.
>>
>> I hope this helps.
> 
> Thanks for writing all of this down but I haven't understood some of it.
> 
> For the sake of this discussion let's say that we managed to "reserve"
> the range early enough like we do for other modules, as you wrote.
> 
> At the point where we want to call reserve_heap_pages() we would call
> init_heap_pages() just before it. We are still relatively early at boot
> so there aren't any concurrent memory operations. Why this doesn't work?

Because init_heap_pages() may exclude some pages (for instance MFN 0 is 
carved out) or use pages for its internal structure (see 
init_node_heap()). So you can't expect to be able to allocate the exact 
same region by reserve_heap_pages().

> 
> If it doesn't work, I am not following what is your alternative
> suggestion about making "the content of struct page_info to look like it
> was allocated by the allocator."

If you look at alloc_heap_pages(), it will allocate pages, the allocator 
will initialize some fields in struct page_info before returning the 
page. We basically need to do the same thing, so the struct page_info 
looks exactly the same whether we call alloc_heap_pages() or use memory 
that was carved out from the allocator.

David has spent more time than me on this problem, so I may be missing 
some bits. Based on what we did in the LU PoC, my suggestion would be to:
    1) Carve out the memory from any allocator (and before any memory is 
allocated).
    2) Make sure a struct page_info is allocated for those regions in 
the boot allocator
    3) Mark the regions as reserved in the frametable so we can 
differentiate from the others pages.
    4) Allocate the region when necessary

When it is necessary to allocate the region. For each page:
    1) Check if it is a valid page
    2) Check if the page is reserved
    3) Do the necessary preparation on struct page_info

At the moment, in the LU PoC, we are using count_info = PGC_allocated to 
mark the reserved page. I don't particularly like it and not sure of the 
consequence. So I am open to a different way to mark them.

The last part we need to take care is how to hand over the pages to the 
allocator. This may happen if your domain die or ballooning (although 
not in the direct map case). Even without this series, this is actually 
already a problem today because boot allocator pages may be freed 
afterwards (I think this can only happen on x86 so far). But we are 
getting away because in most of the cases you never carve out a full 
NUMA node. This is where David's patch should help.

Cheers,

-- 
Julien Grall


  reply	other threads:[~2020-04-30 18:28 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
2020-04-15  1:02 ` [PATCH 01/12] xen: introduce xen_dom_flags Stefano Stabellini
2020-04-15  9:12   ` Jan Beulich
2020-04-15 13:26     ` Julien Grall
2020-04-29 23:57     ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map Stefano Stabellini
2020-04-15 10:27   ` Jan Beulich
2020-04-15 11:27     ` Andrew Cooper
2020-04-30  0:34     ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs Stefano Stabellini
2020-04-15 13:36   ` Julien Grall
2020-05-01  1:26     ` Stefano Stabellini
2020-05-01  8:30       ` Julien Grall
2020-05-09  0:07         ` Stefano Stabellini
2020-05-09  9:56           ` Julien Grall
2020-04-15  1:02 ` [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability Stefano Stabellini
2020-04-15 11:22   ` Wei Liu
2020-04-17 10:02   ` Jan Beulich
2020-04-29 23:09     ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 05/12] xen: introduce reserve_heap_pages Stefano Stabellini
2020-04-15 13:24   ` Julien Grall
2020-04-17 10:11   ` Jan Beulich
2020-04-29 22:46     ` Stefano Stabellini
2020-04-30  6:29       ` Jan Beulich
2020-04-30 16:21         ` Stefano Stabellini
2020-05-04  9:16           ` Jan Beulich
2020-04-30 14:51       ` Julien Grall
2020-04-30 17:00         ` Stefano Stabellini
2020-04-30 18:27           ` Julien Grall [this message]
2020-05-12  1:10             ` Stefano Stabellini
2020-05-12  8:57               ` Julien Grall
2020-04-15  1:02 ` [PATCH 06/12] xen/arm: reserve 1:1 memory for direct_map domUs Stefano Stabellini
2020-04-15 13:38   ` Julien Grall
2020-04-15  1:02 ` [PATCH 07/12] xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase Stefano Stabellini
2020-04-15 13:41   ` Julien Grall
2020-04-15  1:02 ` [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2 Stefano Stabellini
2020-04-15 14:00   ` Julien Grall
2020-05-01  1:26     ` Stefano Stabellini
2020-05-01  8:23       ` Julien Grall
2020-05-09  0:06         ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3 Stefano Stabellini
2020-04-15 14:09   ` Julien Grall
2020-05-01  1:31     ` Stefano Stabellini
2020-05-01  8:40       ` Julien Grall
2020-05-09  0:06         ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011 Stefano Stabellini
2020-04-15 14:11   ` Julien Grall
2020-05-01  1:26     ` Stefano Stabellini
2020-05-01  8:09       ` Julien Grall
2020-05-09  0:07         ` Stefano Stabellini
2020-05-09 10:11           ` Julien Grall
2020-05-11 22:58             ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU Stefano Stabellini
2020-04-15 14:12   ` Julien Grall
2020-04-29 21:55     ` Stefano Stabellini
2020-04-30 13:51       ` Julien Grall
2020-05-01  1:25         ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices Stefano Stabellini
2020-04-15 14:18   ` Julien Grall
2020-04-29 20:47     ` Stefano Stabellini
2020-04-30 13:01       ` Julien Grall
2020-05-24 14:12         ` Julien Grall
2020-05-26 16:46           ` Stefano Stabellini
2020-05-27 18:09             ` Julien Grall
2020-04-16  8:59 ` [PATCH 0/12] direct-map DomUs Julien Grall
2020-04-29 20:16   ` Stefano Stabellini
2020-04-30 12:54     ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86e8fa89-c6f5-6c9e-4f3e-7f98e8e12c6a@xen.org \
    --to=julien@xen.org \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dwmw@amazon.co.uk \
    --cc=george.dunlap@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=sstabellini@kernel.org \
    --cc=stefano.stabellini@xilinx.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).