All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: "Daniel P. Smith" <dpsmith@apertussolutions.com>
Cc: scott.davis@starlab.io, christopher.clark@starlab.io,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	xen-devel@lists.xenproject.org, "Wei Liu" <wl@xen.org>
Subject: Re: [PATCH v1 15/18] x86: rework domain page allocation
Date: Wed, 27 Jul 2022 15:22:08 +0200	[thread overview]
Message-ID: <0832fa09-8b1e-86e8-1291-fd071fe44575@suse.com> (raw)
In-Reply-To: <20220706210454.30096-16-dpsmith@apertussolutions.com>

On 06.07.2022 23:04, Daniel P. Smith wrote:
> This reworks all the dom0 page allocation functions for general domain
> construction. Where possible, common logic between the two was split into a
> separate function for reuse by the two functions.

You absolutely need to mention what behavioral / functional changes there
are (intended), even in case it is "none".

> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -320,69 +320,31 @@ static unsigned long __init default_nr_pages(unsigned long avail)
>  }
>  
>  unsigned long __init dom0_compute_nr_pages(
> -    struct domain *d, struct elf_dom_parms *parms, unsigned long initrd_len)
> +    struct boot_domain *bd, struct elf_dom_parms *parms,
> +    unsigned long initrd_len)
>  {
> -    nodeid_t node;
> -    unsigned long avail = 0, nr_pages, min_pages, max_pages, iommu_pages = 0;
> +    unsigned long avail, nr_pages, min_pages, max_pages;
>  
>      /* The ordering of operands is to work around a clang5 issue. */
>      if ( CONFIG_DOM0_MEM[0] && !dom0_mem_set )
>          parse_dom0_mem(CONFIG_DOM0_MEM);
>  
> -    for_each_node_mask ( node, dom0_nodes )
> -        avail += avail_domheap_pages_region(node, 0, 0) +
> -                 initial_images_nrpages(node);
> -
> -    /* Reserve memory for further dom0 vcpu-struct allocations... */
> -    avail -= (d->max_vcpus - 1UL)
> -             << get_order_from_bytes(sizeof(struct vcpu));
> -    /* ...and compat_l4's, if needed. */
> -    if ( is_pv_32bit_domain(d) )
> -        avail -= d->max_vcpus - 1;
> -
> -    /* Reserve memory for iommu_dom0_init() (rough estimate). */
> -    if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
> -    {
> -        unsigned int s;
> -
> -        for ( s = 9; s < BITS_PER_LONG; s += 9 )
> -            iommu_pages += max_pdx >> s;
> -
> -        avail -= iommu_pages;
> -    }
> +    avail = dom_avail_nr_pages(bd, dom0_nodes);
>  
> -    if ( paging_mode_enabled(d) || opt_dom0_shadow || opt_pv_l1tf_hwdom )
> +    /* command line overrides configuration */
> +    if (  dom0_mem_set )

Nit: Stray double blanks.

>      {
> -        unsigned long cpu_pages;
> -
> -        nr_pages = get_memsize(&dom0_size, avail) ?: default_nr_pages(avail);
> -
> -        /*
> -         * Clamp according to min/max limits and available memory
> -         * (preliminary).
> -         */
> -        nr_pages = max(nr_pages, get_memsize(&dom0_min_size, avail));
> -        nr_pages = min(nr_pages, get_memsize(&dom0_max_size, avail));
> -        nr_pages = min(nr_pages, avail);
> -
> -        cpu_pages = dom0_paging_pages(d, nr_pages);
> -
> -        if ( !iommu_use_hap_pt(d) )
> -            avail -= cpu_pages;
> -        else if ( cpu_pages > iommu_pages )
> -            avail -= cpu_pages - iommu_pages;

I can't see any of this represented in the new code. Have you gone through
the history of this code, to understand why things are the way they are,
and hence what (corner) cases need to remain behaviorally unchanged?

> @@ -40,6 +42,106 @@ struct vcpu *__init alloc_dom_vcpu0(struct boot_domain *bd)
>  }
>  
>  
> +unsigned long __init dom_avail_nr_pages(
> +    struct boot_domain *bd, nodemask_t nodes)
> +{
> +    unsigned long avail = 0, iommu_pages = 0;
> +    bool is_ctldom = false, is_hwdom = false;
> +    unsigned long nr_pages = bd->meminfo.mem_size.nr_pages;
> +    nodeid_t node;
> +
> +    if ( builder_is_ctldom(bd) )
> +        is_ctldom = true;
> +    if ( builder_is_hwdom(bd) )
> +        is_hwdom = true;
> +
> +    for_each_node_mask ( node, nodes )
> +        avail += avail_domheap_pages_region(node, 0, 0) +
> +                 initial_images_nrpages(node);

I don't think this is suitable for other than Dom0, so I question the
splitting out and generalizing of this logic. For "ordinary" domains
their memory size should be well-defined rather than inferred from
host capacity.

Starting from host capacity also means you become ordering dependent
when it comes to creating (not starting) all the domains: Which one
is to come first? And even with this limited to just Dom0 - is its
size calculated before or after all the other domains were created?

> +    /* Reserve memory for further dom0 vcpu-struct allocations... */

dom0?

> +    avail -= (bd->domain->max_vcpus - 1UL)
> +             << get_order_from_bytes(sizeof(struct vcpu));
> +    /* ...and compat_l4's, if needed. */
> +    if ( is_pv_32bit_domain(bd->domain) )
> +        avail -= bd->domain->max_vcpus - 1;
> +
> +    /* Reserve memory for iommu_dom0_init() (rough estimate). */
> +    if ( is_hwdom && is_iommu_enabled(bd->domain) && !iommu_hwdom_passthrough )

Again the question why this would be Dom0-only.

> +    {
> +        unsigned int s;
> +
> +        for ( s = 9; s < BITS_PER_LONG; s += 9 )
> +            iommu_pages += max_pdx >> s;
> +
> +        avail -= iommu_pages;
> +    }
> +
> +    if ( paging_mode_enabled(bd->domain) ||
> +         (is_ctldom && opt_dom0_shadow) ||
> +         (is_hwdom && opt_pv_l1tf_hwdom) )

An interesting combination of conditions. It (again) looks to me as if
it first needs properly separating Dom0 from hwdom, in an abstract
sense.

> +    {
> +        unsigned long cpu_pages = dom0_paging_pages(bd->domain, nr_pages);
> +
> +        if ( !iommu_use_hap_pt(bd->domain) )
> +            avail -= cpu_pages;
> +        else if ( cpu_pages > iommu_pages )
> +            avail -= cpu_pages - iommu_pages;
> +    }
> +
> +    return avail;
> +}
> +
> +unsigned long __init dom_compute_nr_pages(
> +    struct boot_domain *bd, struct elf_dom_parms *parms,
> +    unsigned long initrd_len)
> +{
> +    unsigned long avail, nr_pages = bd->meminfo.mem_size.nr_pages;
> +
> +    if ( builder_is_initdom(bd) )
> +        return dom0_compute_nr_pages(bd, parms, initrd_len);
> +
> +    avail = dom_avail_nr_pages(bd, node_online_map);
> +
> +    if ( is_pv_domain(bd->domain) && (parms->p2m_base == UNSET_ADDR) )
> +    {
> +        /*
> +         * Legacy Linux kernels (i.e. such without a XEN_ELFNOTE_INIT_P2M
> +         * note) require that there is enough virtual space beyond the initial
> +         * allocation to set up their initial page tables. This space is
> +         * roughly the same size as the p2m table, so make sure the initial
> +         * allocation doesn't consume more than about half the space that's
> +         * available between params.virt_base and the address space end.
> +         */

This duplicates an existing comment (and hence below likely also
existing code) rather than replacing / moving the original. As in
an earlier case - how are the two going to remain in sync?

Jan


  reply	other threads:[~2022-07-27 13:22 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06 21:04 [PATCH v1 00/18] Hyperlaunch Daniel P. Smith
2022-07-06 21:04 ` [PATCH v1 01/18] kconfig: allow configuration of maximum modules Daniel P. Smith
2022-07-07  1:44   ` Henry Wang
2022-07-15 19:16   ` Julien Grall
2022-07-19 16:36     ` Daniel P. Smith
2022-07-26 18:07       ` Julien Grall
2022-07-27  6:12         ` Jan Beulich
2022-07-19  9:32   ` Jan Beulich
2022-07-19 17:02     ` Daniel P. Smith
2022-07-20  7:27       ` Jan Beulich
2022-07-22 15:00         ` Daniel P. Smith
2022-07-06 21:04 ` [PATCH v1 02/18] introduction of generalized boot info Daniel P. Smith
2022-07-15 19:25   ` Julien Grall
2022-07-20 18:32     ` Daniel P. Smith
2022-07-19 13:11   ` Jan Beulich
2022-07-21 14:28     ` Daniel P. Smith
2022-07-21 16:00       ` Jan Beulich
2022-07-21 16:00       ` Jan Beulich
2022-07-22 16:01         ` Daniel P. Smith
2022-07-25  7:05           ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 03/18] x86: adopt new boot info structures Daniel P. Smith
2022-07-19 13:19   ` Jan Beulich
2022-07-22 12:34     ` Daniel P. Smith
2022-07-06 21:04 ` [PATCH v1 04/18] x86: refactor entrypoints to new boot info Daniel P. Smith
2022-07-18 13:58   ` Smith, Jackson
2022-07-22 12:59     ` Daniel P. Smith
2022-07-06 21:04 ` [PATCH v1 05/18] x86: refactor xen cmdline into general framework Daniel P. Smith
2022-07-19 13:26   ` Jan Beulich
2022-07-22 13:12     ` Daniel P. Smith
2022-07-25  7:09       ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 06/18] fdt: make fdt handling reusable across arch Daniel P. Smith
2022-07-07  1:44   ` Henry Wang
2022-07-19  9:36   ` Jan Beulich
2022-07-22 13:18     ` Daniel P. Smith
2022-07-06 21:04 ` [PATCH v1 07/18] docs: update hyperlaunch device tree documentation Daniel P. Smith
2022-07-18 13:57   ` Smith, Jackson
2022-07-22 13:34     ` Daniel P. Smith
2022-07-06 21:04 ` [PATCH v1 08/18] kconfig: introduce domain builder config option Daniel P. Smith
2022-07-07  1:44   ` Henry Wang
2022-07-19 13:29   ` Jan Beulich
2022-07-22 13:47     ` Daniel P. Smith
2022-07-06 21:04 ` [PATCH v1 09/18] x86: introduce abstractions for domain builder Daniel P. Smith
2022-07-26 14:22   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 10/18] x86: introduce the " Daniel P. Smith
2022-07-18 13:59   ` Smith, Jackson
2022-07-22 14:36     ` Daniel P. Smith
2022-07-22 20:33       ` Smith, Jackson
2022-07-23 10:45         ` Daniel P. Smith
2022-07-26 14:46   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 11/18] x86: initial conversion to " Daniel P. Smith
2022-07-26 15:01   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 12/18] x86: convert dom0 creation " Daniel P. Smith
2022-07-27 12:25   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 13/18] x86: generalize physmap logic Daniel P. Smith
2022-07-27 12:33   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 14/18] x86: generalize vcpu for domain building Daniel P. Smith
2022-07-27 12:46   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 15/18] x86: rework domain page allocation Daniel P. Smith
2022-07-27 13:22   ` Jan Beulich [this message]
2022-07-06 21:04 ` [PATCH v1 16/18] x86: add pv multidomain construction Daniel P. Smith
2022-07-27 14:12   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 17/18] builder: introduce domain builder hypfs tree Daniel P. Smith
2022-07-27 14:30   ` Jan Beulich
2022-07-06 21:04 ` [PATCH v1 18/18] tools: introduce example late pv helper Daniel P. Smith
2022-07-19 17:06 ` [PATCH v1 00/18] Hyperlaunch Smith, Jackson
2022-07-22 14:51   ` Daniel P. Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0832fa09-8b1e-86e8-1291-fd071fe44575@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=christopher.clark@starlab.io \
    --cc=dpsmith@apertussolutions.com \
    --cc=roger.pau@citrix.com \
    --cc=scott.davis@starlab.io \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.