Re: [PATCH v4 07/11] xen: add cache coloring allocator for domains

From: Jan Beulich <jbeulich@suse.com>
To: Carlo Nonato <carlo.nonato@minervasys.tech>
Cc: Luca Miccio <lucmiccio@gmail.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Julien Grall <julien@xen.org>,
	Bertrand Marquis <bertrand.marquis@arm.com>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>, Wei Liu <wl@xen.org>,
	Marco Solieri <marco.solieri@minervasys.tech>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v4 07/11] xen: add cache coloring allocator for domains
Date: Fri, 27 Jan 2023 14:31:53 +0100	[thread overview]
Message-ID: <c843b031-52f7-056d-e8c0-75fe9c426343@suse.com> (raw)
In-Reply-To: <CAG+AhRXvKi4HmPoOL7cLToCgOPf3-evvcdvSzYGZ6fLc+mBvtQ@mail.gmail.com>

On 27.01.2023 11:17, Carlo Nonato wrote:
> On Thu, Jan 26, 2023 at 5:29 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 23.01.2023 16:47, Carlo Nonato wrote:
>>> --- a/xen/arch/arm/include/asm/mm.h
>>> +++ b/xen/arch/arm/include/asm/mm.h
>>> @@ -128,6 +128,9 @@ struct page_info
>>>  #else
>>>  #define PGC_static     0
>>>  #endif
>>> +/* Page is cache colored */
>>> +#define _PGC_colored      PG_shift(4)
>>> +#define PGC_colored       PG_mask(1, 4)
>>
>> Is there a reason you don't follow the conditional approach we've taken
>> for PGC_static?
> 
> Nope, fixed that.

And btw, if at all possible please avoid the #else part. page_alloc.c
already deals with that case, so it would be needed only if you have a
use of this constant somewhere else.

>>> @@ -1488,7 +1497,7 @@ static void free_heap_pages(
>>>              /* Merge with predecessor block? */
>>>              if ( !mfn_valid(page_to_mfn(predecessor)) ||
>>>                   !page_state_is(predecessor, free) ||
>>> -                 (predecessor->count_info & PGC_static) ||
>>> +                 (predecessor->count_info & (PGC_static | PGC_colored)) ||
>>>                   (PFN_ORDER(predecessor) != order) ||
>>>                   (page_to_nid(predecessor) != node) )
>>>                  break;
>>> @@ -1512,7 +1521,7 @@ static void free_heap_pages(
>>>              /* Merge with successor block? */
>>>              if ( !mfn_valid(page_to_mfn(successor)) ||
>>>                   !page_state_is(successor, free) ||
>>> -                 (successor->count_info & PGC_static) ||
>>> +                 (successor->count_info & (PGC_static | PGC_colored)) ||
>>>                   (PFN_ORDER(successor) != order) ||
>>>                   (page_to_nid(successor) != node) )
>>>                  break;
>>
>> This, especially without being mentioned in the description (only in the
>> revision log), could likely also be split out (and then also be properly
>> justified).
> 
> You mean to make it a prereq patch or putting it after this patch?

A prereq, for ...

> In the second case it would be like a fix for the previous patch, something
> that is to be avoided, right?

... precisely this reason. Elsewhere you make a constant from
PGC_extra | PGC_static | PGC_colored. Maybe that could become a file scope
one, which you could then use here as well. Then the change wouldn't even
depend on you already having introduced PGC_colored (but otherwise just
having the stub #define earlier in the file would suffice as well for this
to compile fine independent of the bulk of the changes).

(FTAOD PGC_extra would be meaningless / benign here, for only ever being
set on allocated pages.)

>>> +static void free_color_heap_page(struct page_info *pg)
>>> +{
>>> +    unsigned int color = page_to_llc_color(pg), nr_colors = get_nr_llc_colors();
>>> +    unsigned long pdx = page_to_pdx(pg);
>>> +    colored_pages_t *head = color_heap(color);
>>> +    struct page_info *prev = pdx >= nr_colors ? pg - nr_colors : NULL;
>>> +    struct page_info *next = pdx + nr_colors < FRAMETABLE_NR ? pg + nr_colors
>>> +                                                             : NULL;
>>
>> Are these two calculations safe? At least on x86 parts of frame_table[] may
>> not be populated, so de-referencing prev and/or next might fault.
> 
> I have to admit I've not taken this into consideration. I'll check that.
> 
>>> +    spin_lock(&heap_lock);
>>> +
>>> +    if ( is_free_colored_page(prev) )
>>> +        next = page_list_next(prev, head);
>>> +    else if ( !is_free_colored_page(next) )
>>> +    {
>>> +        /*
>>> +         * FIXME: linear search is slow, but also note that the frametable is
>>> +         * used to find free pages in the immediate neighborhood of pg in
>>> +         * constant time. When freeing contiguous pages, the insert position of
>>> +         * most of them is found without the linear search.
>>> +         */
>>> +        page_list_for_each( next, head )
>>> +        {
>>> +            if ( page_to_maddr(next) > page_to_maddr(pg) )
>>> +                break;
>>> +        }
>>> +    }
>>> +
>>> +    mark_page_free(pg, page_to_mfn(pg));
>>> +    pg->count_info |= PGC_colored;
>>> +    free_colored_pages[color]++;
>>> +    page_list_add_prev(pg, next, head);
>>> +
>>> +    spin_unlock(&heap_lock);
>>> +}
>>
>> There's no scrubbing here at all, and no mention of the lack thereof in
>> the description.
> 
> This comes from the very first version (which I'm not an author of).
> Can you explain to me briefly what is it and if I need it? Or can you give
> me pointers?

Did you look for occurrences of the word "scrub" elsewhere in the file?
You want to avoid pages holding information from one guest to become
visible unchanged to another one.

>>> +static void __init init_color_heap_pages(struct page_info *pg,
>>> +                                         unsigned long nr_pages)
>>> +{
>>> +    unsigned int i;
>>> +
>>> +    if ( buddy_alloc_size )
>>> +    {
>>> +        unsigned long buddy_pages = min(PFN_DOWN(buddy_alloc_size), nr_pages);
>>> +
>>> +        init_heap_pages(pg, buddy_pages);
>>> +        nr_pages -= buddy_pages;
>>> +        buddy_alloc_size -= buddy_pages << PAGE_SHIFT;
>>> +        pg += buddy_pages;
>>> +    }
>>
>> I think you want to bail here if nr_pages is now zero, not the least to avoid
>> crashing ...
>>
>>> +    if ( !_color_heap )
>>> +    {
>>> +        unsigned int nr_colors = get_nr_llc_colors();
>>> +
>>> +        _color_heap = xmalloc_array(colored_pages_t, nr_colors);
>>> +        BUG_ON(!_color_heap);
>>> +        free_colored_pages = xzalloc_array(unsigned long, nr_colors);
>>> +        BUG_ON(!free_colored_pages);
>>
>> ... here in case the amount that was freed was really tiny.
> 
> Here the array is allocated with nr_colors not nr_pages. nr_colors can't be 0.
> And if nr_pages is 0 it just means that all pages went to the buddy. I see no
> crash in this case.

Aren't the two BUG_ON() not very clear crash causes? My comment wasn't
about nr_pages == 0 being an issue further down (I think I had convinced
myself that this case was handled correctly), but about the buddy
allocator still having too little memory to satisfy the two allocations
here (which also you don't need yet if there's no memory to go to the
colored allocator in the first place).

>>> --- a/xen/include/xen/mm.h
>>> +++ b/xen/include/xen/mm.h
>>> @@ -299,6 +299,33 @@ page_list_add_tail(struct page_info *page, struct page_list_head *head)
>>>      }
>>>      head->tail = page;
>>>  }
>>> +static inline void
>>> +_page_list_add(struct page_info *page, struct page_info *prev,
>>> +               struct page_info *next)
>>> +{
>>> +    page->list.prev = page_to_pdx(prev);
>>> +    page->list.next = page_to_pdx(next);
>>> +    prev->list.next = page_to_pdx(page);
>>> +    next->list.prev = page_to_pdx(page);
>>> +}
>>> +static inline void
>>> +page_list_add_prev(struct page_info *page, struct page_info *next,
>>> +                   struct page_list_head *head)
>>> +{
>>> +    struct page_info *prev;
>>> +
>>> +    if ( !next )
>>> +    {
>>> +        page_list_add_tail(page, head);
>>> +        return;
>>> +    }
>>
>> !next is ambiguous in its meaning, so a comment towards the intended
>> behavior here would be helpful. It could be that the tail insertion is
>> necessary behavior, but it also could be that insertion anywhere would
>> actually be okay, and tail insertion is merely the variant you ended up
>> picking.
> 
> This makes it possible to call the function when next has been used to iterate
> over a list. If there's no next we are at the end of the list, so add to the
> tail. The other way is to handle the case outside this function.
> 
>> Then again ...
>>
>>> +    prev = page_list_prev(next, head);
>>> +    if ( !prev )
>>> +        page_list_add(page, head);
>>> +    else
>>> +        _page_list_add(page, prev, next);
>>> +}
>>>  static inline bool_t
>>>  __page_list_del_head(struct page_info *page, struct page_list_head *head,
>>>                       struct page_info *next, struct page_info *prev)
>>> @@ -451,6 +478,12 @@ page_list_add_tail(struct page_info *page, struct page_list_head *head)
>>>      list_add_tail(&page->list, head);
>>>  }
>>>  static inline void
>>> +page_list_add_prev(struct page_info *page, struct page_info *next,
>>> +                   struct page_list_head *head)
>>> +{
>>> +    list_add_tail(&page->list, &next->list);
>>
>> ... you don't care about !next here at all?
> 
> I did it this way since *page is never checked for NULL as well. Also, this
> other type of list isn't NULL terminated.

I see. In which case properly explaining the intended behavior and use case
in the earlier function should hopefully eliminate the need for anything
special here. I have to admit though that I consider this a little fragile
for a seemingly generic helper function; I wonder whether the special case
wouldn't better be handled in the caller instead.

Jan