On Fri, 2020-03-20 at 13:33 +0000, Paul Durrant wrote: > > -----Original Message----- > > From: Xen-devel On Behalf Of David Woodhouse > > Sent: 19 March 2020 21:22 > > To: xen-devel@lists.xenproject.org > > Cc: Stefano Stabellini ; Julien Grall ; Wei Liu ; > > Andrew Cooper ; Ian Jackson ; George Dunlap > > ; hongyxia@amazon.com; Jan Beulich ; Volodymyr Babchuk > > ; Roger Pau Monné > > Subject: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PGC_state_uninitialised > > > > From: David Woodhouse > > > > It is possible for pages to enter general circulation without ever > > being process by init_heap_pages(). > > > > For example, pages of the multiboot module containing the initramfs may > > be assigned via assign_pages() to dom0 as it is created. And some code > > including map_pages_to_xen() has checks on 'system_state' to determine > > whether to use the boot or the heap allocator, but it seems impossible > > to prove that pages allocated by the boot allocator are not subsequently > > freed with free_heap_pages(). > > > > This actually works fine in the majority of cases; there are only a few > > esoteric corner cases which init_heap_pages() handles before handing the > > page range off to free_heap_pages(): > > • Excluding MFN #0 to avoid inappropriate cross-zone merging. > > • Ensuring that the node information structures exist, when the first > > page(s) of a given node are handled. > > • High order allocations crossing from one node to another. > > > > To handle this case, shift PG_state_inuse from its current value of > > zero, to another value. Use zero, which is the initial state of the > > entire frame table, as PG_state_uninitialised. > > > > Fix a couple of assertions which were assuming that PG_state_inuse is > > zero, and make them cope with the PG_state_uninitialised case too where > > appopriate. > > > > Finally, make free_heap_pages() call through to init_heap_pages() when > > given a page range which has not been initialised. This cannot keep > > recursing because init_heap_pages() will set each page state to > > PGC_state_inuse before passing it back to free_heap_pages() for the > > second time. > > > > Signed-off-by: David Woodhouse > > --- > > xen/arch/x86/mm.c | 3 ++- > > xen/common/page_alloc.c | 44 +++++++++++++++++++++++++++++----------- > > xen/include/asm-arm/mm.h | 3 ++- > > xen/include/asm-x86/mm.h | 3 ++- > > 4 files changed, 38 insertions(+), 15 deletions(-) > > > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c > > index 62507ca651..5f0581c072 100644 > > --- a/xen/arch/x86/mm.c > > +++ b/xen/arch/x86/mm.c > > @@ -491,7 +491,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d, > > > > page_set_owner(page, d); > > smp_wmb(); /* install valid domain ptr before updating refcnt. */ > > - ASSERT((page->count_info & ~PGC_xen_heap) == 0); > > + ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse || > > + (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised); > > Could the page state perhaps be bumped to inuse in this case? It > seems odd to leave state uninitialized yet succeed in sharing with a > guest. No, that doesn't really work. You can't just *declare* that the page was properly initialised, because it isn't true. There's a pathological case where the zone hasn't been initialised at all, because *all* the pages in that zone got handed out by the boot allocator or used for initrd etc. The first pages 'freed' in that zone end up being used (in init_heap_pages) to create the zone structures. Likewise, it could include a page which init_heap_pages() doesn't actually *put* into the buddy allocator, to work around the cross-zone merge problem. It's fine to use that page and share it with a guest, but it can't ever be freed into the buddy allocator. > > @@ -1828,7 +1840,8 @@ static void init_heap_pages( > > nr_pages -= n; > > } > > > > - free_heap_pages(pg + i, 0, scrub_debug || idle_scrub); > > Would it be worth an ASSERT(!pg[i].count_info) here in case something > is added which erroneously modifies the page count info before this > is done? That seems valid, I think. Will test it. > > > > page_set_owner(&pg[i], d); > > smp_wmb(); /* Domain pointer must be visible before updating refcnt. */ > > - pg[i].count_info = > > - (pg[i].count_info & PGC_extra) | PGC_allocated | 1; > > + pg[i].count_info = (pg[i].count_info & PGC_state) | PGC_allocated | 1; > > The PGC_extra seems to have vapourized here. Oops. Will fix; thanks.