All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Carlo Nonato <carlo.nonato@minervasys.tech>
Cc: Luca Miccio <lucmiccio@gmail.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Julien Grall <julien@xen.org>,
	Bertrand Marquis <bertrand.marquis@arm.com>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>, Wei Liu <wl@xen.org>,
	Marco Solieri <marco.solieri@minervasys.tech>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v4 07/11] xen: add cache coloring allocator for domains
Date: Thu, 26 Jan 2023 17:29:39 +0100	[thread overview]
Message-ID: <21f1e368-5d44-b689-0bb7-164a53e5ffd7@suse.com> (raw)
In-Reply-To: <20230123154735.74832-8-carlo.nonato@minervasys.tech>

On 23.01.2023 16:47, Carlo Nonato wrote:
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -299,6 +299,20 @@ can be maintained with the pv-shim mechanism.
>      cause Xen not to use Indirect Branch Tracking even when support is
>      available in hardware.
>  
> +### buddy-alloc-size (arm64)

I can't find where such a command line option would be processed.

> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -128,6 +128,9 @@ struct page_info
>  #else
>  #define PGC_static     0
>  #endif
> +/* Page is cache colored */
> +#define _PGC_colored      PG_shift(4)
> +#define PGC_colored       PG_mask(1, 4)

Is there a reason you don't follow the conditional approach we've taken
for PGC_static?

Thinking of which - what are the planned interactions with the static
allocator? If the two are exclusive of one another, I guess that would
need expressing somehow.

> --- a/xen/arch/arm/llc_coloring.c
> +++ b/xen/arch/arm/llc_coloring.c
> @@ -33,6 +33,8 @@ static paddr_t __ro_after_init addr_col_mask;
>  static unsigned int __ro_after_init dom0_colors[CONFIG_NR_LLC_COLORS];
>  static unsigned int __ro_after_init dom0_num_colors;
>  
> +#define addr_to_color(addr) (((addr) & addr_col_mask) >> PAGE_SHIFT)

You're shifting right by PAGE_SHIFT here just to ...

> @@ -299,6 +301,16 @@ unsigned int *llc_colors_from_str(const char *str, unsigned int *num_colors)
>      return colors;
>  }
>  
> +unsigned int page_to_llc_color(const struct page_info *pg)
> +{
> +    return addr_to_color(page_to_maddr(pg));

... undo the corresponding left shift in page_to_maddr(). Depending
on other uses of addr_col_mask it may be worthwhile to either change
that to or accompany it by a mask to operate on frame numbers.

> @@ -924,6 +929,13 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
>      }
>  }
>  
> +/* Initialise fields which have other uses for free pages. */
> +static void init_free_page_fields(struct page_info *pg)
> +{
> +    pg->u.inuse.type_info = PGT_TYPE_INFO_INITIALIZER;
> +    page_set_owner(pg, NULL);
> +}

To limit the size of the functional change, abstracting out this function
could helpfully be a separate patch (and could then also likely go in ahead
of time, simplifying things slightly for you as well).

> @@ -1488,7 +1497,7 @@ static void free_heap_pages(
>              /* Merge with predecessor block? */
>              if ( !mfn_valid(page_to_mfn(predecessor)) ||
>                   !page_state_is(predecessor, free) ||
> -                 (predecessor->count_info & PGC_static) ||
> +                 (predecessor->count_info & (PGC_static | PGC_colored)) ||
>                   (PFN_ORDER(predecessor) != order) ||
>                   (page_to_nid(predecessor) != node) )
>                  break;
> @@ -1512,7 +1521,7 @@ static void free_heap_pages(
>              /* Merge with successor block? */
>              if ( !mfn_valid(page_to_mfn(successor)) ||
>                   !page_state_is(successor, free) ||
> -                 (successor->count_info & PGC_static) ||
> +                 (successor->count_info & (PGC_static | PGC_colored)) ||
>                   (PFN_ORDER(successor) != order) ||
>                   (page_to_nid(successor) != node) )
>                  break;

This, especially without being mentioned in the description (only in the
revision log), could likely also be split out (and then also be properly
justified).

> @@ -1928,6 +1937,182 @@ static unsigned long avail_heap_pages(
>      return free_pages;
>  }
>  
> +#ifdef CONFIG_LLC_COLORING
> +/*************************
> + * COLORED SIDE-ALLOCATOR
> + *
> + * Pages are grouped by LLC color in lists which are globally referred to as the
> + * color heap. Lists are populated in end_boot_allocator().
> + * After initialization there will be N lists where N is the number of
> + * available colors on the platform.
> + */
> +typedef struct page_list_head colored_pages_t;

To me this type rather hides information, so I think I would prefer if
you dropped it.

> +static colored_pages_t *__ro_after_init _color_heap;
> +static unsigned long *__ro_after_init free_colored_pages;
> +
> +/* Memory required for buddy allocator to work with colored one */
> +static unsigned long __initdata buddy_alloc_size =
> +    CONFIG_BUDDY_ALLOCATOR_SIZE << 20;

Please don't open-code MB().

> +#define color_heap(color) (&_color_heap[color])
> +
> +static bool is_free_colored_page(struct page_info *page)

const please (and wherever applicable throughout the series)

> +{
> +    return page && (page->count_info & PGC_state_free) &&
> +                   (page->count_info & PGC_colored);
> +}
> +
> +/*
> + * The {free|alloc}_color_heap_page overwrite pg->count_info, but they do it in
> + * the same way as the buddy allocator corresponding functions do:
> + * protecting the access with a critical section using heap_lock.
> + */

I think such a comment would only be useful if you did things differently,
even if just slightly. And indeed I think you do, e.g. by ORing in
PGC_colored below (albeit that's still similar to unprepare_staticmem_pages(),
so perhaps fine without further explanation). Differences are what may need
commenting on (such that the safety thereof can be judged upon).

> +static void free_color_heap_page(struct page_info *pg)
> +{
> +    unsigned int color = page_to_llc_color(pg), nr_colors = get_nr_llc_colors();
> +    unsigned long pdx = page_to_pdx(pg);
> +    colored_pages_t *head = color_heap(color);
> +    struct page_info *prev = pdx >= nr_colors ? pg - nr_colors : NULL;
> +    struct page_info *next = pdx + nr_colors < FRAMETABLE_NR ? pg + nr_colors
> +                                                             : NULL;

Are these two calculations safe? At least on x86 parts of frame_table[] may
not be populated, so de-referencing prev and/or next might fault.

> +    spin_lock(&heap_lock);
> +
> +    if ( is_free_colored_page(prev) )
> +        next = page_list_next(prev, head);
> +    else if ( !is_free_colored_page(next) )
> +    {
> +        /*
> +         * FIXME: linear search is slow, but also note that the frametable is
> +         * used to find free pages in the immediate neighborhood of pg in
> +         * constant time. When freeing contiguous pages, the insert position of
> +         * most of them is found without the linear search.
> +         */
> +        page_list_for_each( next, head )
> +        {
> +            if ( page_to_maddr(next) > page_to_maddr(pg) )
> +                break;
> +        }
> +    }
> +
> +    mark_page_free(pg, page_to_mfn(pg));
> +    pg->count_info |= PGC_colored;
> +    free_colored_pages[color]++;
> +    page_list_add_prev(pg, next, head);
> +
> +    spin_unlock(&heap_lock);
> +}

There's no scrubbing here at all, and no mention of the lack thereof in
the description.

> +static void __init init_color_heap_pages(struct page_info *pg,
> +                                         unsigned long nr_pages)
> +{
> +    unsigned int i;
> +
> +    if ( buddy_alloc_size )
> +    {
> +        unsigned long buddy_pages = min(PFN_DOWN(buddy_alloc_size), nr_pages);
> +
> +        init_heap_pages(pg, buddy_pages);
> +        nr_pages -= buddy_pages;
> +        buddy_alloc_size -= buddy_pages << PAGE_SHIFT;
> +        pg += buddy_pages;
> +    }

I think you want to bail here if nr_pages is now zero, not the least to avoid
crashing ...

> +    if ( !_color_heap )
> +    {
> +        unsigned int nr_colors = get_nr_llc_colors();
> +
> +        _color_heap = xmalloc_array(colored_pages_t, nr_colors);
> +        BUG_ON(!_color_heap);
> +        free_colored_pages = xzalloc_array(unsigned long, nr_colors);
> +        BUG_ON(!free_colored_pages);

... here in case the amount that was freed was really tiny.

> +        for ( i = 0; i < nr_colors; i++ )
> +            INIT_PAGE_LIST_HEAD(color_heap(i));
> +    }
> +
> +    printk(XENLOG_DEBUG
> +           "Init color heap with %lu pages starting from: %#"PRIx64"\n",
> +           nr_pages, page_to_maddr(pg));
> +
> +    for ( i = 0; i < nr_pages; i++ )
> +        free_color_heap_page(&pg[i]);
> +}
> +
> +static void dump_color_heap(void)
> +{
> +    unsigned int color;
> +
> +    printk("Dumping color heap info\n");
> +    for ( color = 0; color < get_nr_llc_colors(); color++ )
> +        printk("Color heap[%u]: %lu pages\n", color, free_colored_pages[color]);

When there are many colors and most memory is used, you may produce a
lot of output here for just displaying zeros. May I suggest that you
log only non-zero values?

> +}
> +
> +#else /* !CONFIG_LLC_COLORING */
> +
> +static void free_color_heap_page(struct page_info *pg) {}
> +static void __init init_color_heap_pages(struct page_info *pg,
> +                                         unsigned long nr_pages) {}
> +static struct page_info *alloc_color_heap_page(unsigned int memflags,
> +                                               struct domain *d)
> +{
> +    return NULL;
> +}
> +static void dump_color_heap(void) {}

As said elsewhere (albeit for a slightly different reason): It may be
worthwhile to try to omit these stubs and instead expose the normal
code to the compiler unconditionally, relying on DCE. That'll reduce
the risk of people breaking the coloring code without noticing, when
build-testing only other configurations.

> @@ -1936,12 +2121,19 @@ void __init end_boot_allocator(void)
>      for ( i = 0; i < nr_bootmem_regions; i++ )
>      {
>          struct bootmem_region *r = &bootmem_region_list[i];
> -        if ( (r->s < r->e) &&
> -             (mfn_to_nid(_mfn(r->s)) == cpu_to_node(0)) )
> +        if ( r->s < r->e )
>          {
> -            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
> -            r->e = r->s;
> -            break;
> +            if ( llc_coloring_enabled )
> +            {
> +                init_color_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
> +                r->e = r->s;
> +            }
> +            else if ( mfn_to_nid(_mfn(r->s)) == cpu_to_node(0) )
> +            {
> +                init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
> +                r->e = r->s;
> +                break;
> +            }

I think the coloring part here deserves a comment, or else it can easily
look as if there was a missing "break" (or it was placed in the wrong
scope). I also think it might help to restructure your change a little,
both to reduce the diff and to keep indentation bounded:

  if ( r->s >= r->e )
    continue;

  if ( llc_coloring_enabled )
    ...

Also please take the opportunity to add the missing blank lines between
declaration and statements.

> @@ -2332,6 +2524,7 @@ int assign_pages(
>  {
>      int rc = 0;
>      unsigned int i;
> +    unsigned long allowed_flags = (PGC_extra | PGC_static | PGC_colored);

This is one of the few cases where I think "const" would be helpful even
on a not-pointed-to type. There's also not really any need for parentheses
here. As to the name, ...

> @@ -2349,7 +2542,7 @@ int assign_pages(
>  
>          for ( i = 0; i < nr; i++ )
>          {
> -            ASSERT(!(pg[i].count_info & ~(PGC_extra | PGC_static)));
> +            ASSERT(!(pg[i].count_info & ~allowed_flags));

... while "allowed" may be fine for this use, it really isn't ...

> @@ -2408,8 +2601,8 @@ int assign_pages(
>          ASSERT(page_get_owner(&pg[i]) == NULL);
>          page_set_owner(&pg[i], d);
>          smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
> -        pg[i].count_info =
> -            (pg[i].count_info & (PGC_extra | PGC_static)) | PGC_allocated | 1;
> +        pg[i].count_info = (pg[i].count_info & allowed_flags) |
> +                           PGC_allocated | 1;

... here. Maybe "preserved_flags" (or just "preserved")?

> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -299,6 +299,33 @@ page_list_add_tail(struct page_info *page, struct page_list_head *head)
>      }
>      head->tail = page;
>  }
> +static inline void
> +_page_list_add(struct page_info *page, struct page_info *prev,
> +               struct page_info *next)
> +{
> +    page->list.prev = page_to_pdx(prev);
> +    page->list.next = page_to_pdx(next);
> +    prev->list.next = page_to_pdx(page);
> +    next->list.prev = page_to_pdx(page);
> +}
> +static inline void
> +page_list_add_prev(struct page_info *page, struct page_info *next,
> +                   struct page_list_head *head)
> +{
> +    struct page_info *prev;
> +
> +    if ( !next )
> +    {
> +        page_list_add_tail(page, head);
> +        return;
> +    }

!next is ambiguous in its meaning, so a comment towards the intended
behavior here would be helpful. It could be that the tail insertion is
necessary behavior, but it also could be that insertion anywhere would
actually be okay, and tail insertion is merely the variant you ended up
picking.

Then again ...

> +    prev = page_list_prev(next, head);
> +    if ( !prev )
> +        page_list_add(page, head);
> +    else
> +        _page_list_add(page, prev, next);
> +}
>  static inline bool_t
>  __page_list_del_head(struct page_info *page, struct page_list_head *head,
>                       struct page_info *next, struct page_info *prev)
> @@ -451,6 +478,12 @@ page_list_add_tail(struct page_info *page, struct page_list_head *head)
>      list_add_tail(&page->list, head);
>  }
>  static inline void
> +page_list_add_prev(struct page_info *page, struct page_info *next,
> +                   struct page_list_head *head)
> +{
> +    list_add_tail(&page->list, &next->list);

... you don't care about !next here at all?

Jan


  parent reply	other threads:[~2023-01-26 16:29 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-23 15:47 [PATCH v4 00/11] Arm cache coloring Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 01/11] xen/common: add cache coloring common code Carlo Nonato
2023-01-24 16:37   ` Jan Beulich
2023-01-25 11:18     ` Carlo Nonato
2023-01-25 13:10       ` Jan Beulich
2023-01-25 16:18         ` Carlo Nonato
2023-01-26  8:06           ` Jan Beulich
2023-01-26 10:15             ` Julien Grall
2023-01-26 11:03               ` Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 02/11] xen/arm: add cache coloring initialization Carlo Nonato
2023-01-24 16:20   ` Jan Beulich
2023-01-23 15:47 ` [PATCH v4 03/11] xen/arm: add Dom0 cache coloring support Carlo Nonato
2024-01-12  9:24   ` Michal Orzel
2024-01-12 10:39     ` Michal Orzel
2024-01-18  0:23     ` Stefano Stabellini
2024-01-18  7:40       ` Michal Orzel
2023-01-23 15:47 ` [PATCH v4 04/11] xen: extend domctl interface for cache coloring Carlo Nonato
2023-01-24 16:29   ` Jan Beulich
2023-01-25 16:27     ` Carlo Nonato
2023-01-26 10:20       ` Julien Grall
2023-01-26 11:19         ` Carlo Nonato
2023-01-26  7:25     ` Jan Beulich
2023-01-26 11:18       ` Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 05/11] tools: add support for cache coloring configuration Carlo Nonato
2023-01-26 14:22   ` Anthony PERARD
2023-01-26 16:34     ` Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 06/11] xen/arm: add support for cache coloring configuration via device-tree Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 07/11] xen: add cache coloring allocator for domains Carlo Nonato
2023-01-24 16:50   ` Jan Beulich
2023-01-26 11:00     ` Carlo Nonato
2023-01-26 12:37       ` Jan Beulich
2023-01-24 16:58   ` Jan Beulich
2023-01-26 16:29   ` Jan Beulich [this message]
2023-01-27 10:17     ` Carlo Nonato
2023-01-27 13:31       ` Jan Beulich
2023-01-23 15:47 ` [PATCH v4 08/11] xen/arm: use colored allocator for p2m page tables Carlo Nonato
2023-01-26 10:25   ` Julien Grall
2023-01-26 11:02     ` Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 09/11] Revert "xen/arm: Remove unused BOOT_RELOC_VIRT_START" Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 10/11] xen/arm: add Xen cache colors command line parameter Carlo Nonato
2023-01-23 15:47 ` [PATCH v4 11/11] xen/arm: add cache coloring support for Xen Carlo Nonato
2023-01-23 15:52 ` [PATCH v4 00/11] Arm cache coloring Jan Beulich
2023-01-23 16:17   ` Carlo Nonato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21f1e368-5d44-b689-0bb7-164a53e5ffd7@suse.com \
    --to=jbeulich@suse.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bertrand.marquis@arm.com \
    --cc=carlo.nonato@minervasys.tech \
    --cc=george.dunlap@citrix.com \
    --cc=julien@xen.org \
    --cc=lucmiccio@gmail.com \
    --cc=marco.solieri@minervasys.tech \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.