All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Julien Grall" <julien@xen.org>,
	"Volodymyr Babchuk" <Volodymyr_Babchuk@epam.com>,
	"Bertrand Marquis" <bertrand.marquis@arm.com>,
	"Henry Wang" <Henry.Wang@arm.com>,
	"Anthony PERARD" <anthony.perard@citrix.com>,
	Xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 1/4] xen: Introduce non-broken hypercalls for the p2m pool size
Date: Wed, 26 Oct 2022 15:42:15 +0200	[thread overview]
Message-ID: <ffb8bdb8-f54b-2107-ce1a-775337c172ac@suse.com> (raw)
In-Reply-To: <20221026102018.4144-2-andrew.cooper3@citrix.com>

On 26.10.2022 12:20, Andrew Cooper wrote:
> The existing XEN_DOMCTL_SHADOW_OP_{GET,SET}_ALLOCATION have problems:
> 
>  * All set_allocation() flavours have an overflow-before-widen bug when
>    calculating "sc->mb << (20 - PAGE_SHIFT)".
>  * All flavours have a granularity of of 1M.  This was tolerable when the size
>    of the pool could only be set at the same granularity, but is broken now
>    that ARM has a 16-page stopgap allocation in use.
>  * All get_allocation() flavours round up, and in particular turn 0 into 1,
>    meaning the get op returns junk before a successful set op.
>  * The x86 flavours reject the hypercalls before the VM has vCPUs allocated,
>    despite the pool size being a domain property.

I guess this is merely a remnant and could easily be dropped there.

>  * Even the hypercall names are long-obsolete.
> 
> Implement an interface that doesn't suck, which can be first used to unit test
> the behaviour, and subsequently correct a broken implementation.  The old
> interface will be retired in due course.
> 
> This is part of XSA-409 / CVE-2022-33747.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Xen Security Team <security@xen.org>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> CC: Bertrand Marquis <bertrand.marquis@arm.com>
> CC: Henry Wang <Henry.Wang@arm.com>
> CC: Anthony PERARD <anthony.perard@citrix.com>
> 
> Name subject to improvement.

paging_{get,set}_mempool_size() for the arch helpers (in particular
fitting better with them living in paging.c as well its multi-purpose use
on x86) and XEN_DOMCTL_{get,set}_paging_mempool_size? Perhaps even the
"mem" could be dropped?

>  ABI not.

With the comment in the public header saying "Users of this interface are
required to identify the granularity by other means" I wonder why the
interface needs to be byte-granular. If the caller needs to know page size
by whatever means, it can as well pass in a page count.

> Future TODOs:
>  * x86 shadow still rounds up.  This is buggy as it's a simultaneous equation
>    with tot_pages which varies over time with ballooning.
>  * x86 PV is weird.  There are no toolstack interact with the shadow pool
>    size, but the "shadow" pool it does come into existence when logdirty (or
>    pv-l1tf) when first enabled.
>  * The shadow+hap logic is in desperate need of deduping.

I have a tiny step towards this queued as post-XSA-410 work, folding HAP's
and shadow's freelist, total_pages, free_pages, and p2m_pages. Here this
would mean {hap,shadow}_get_allocation_bytes() could be done away with,
having the logic exclusively in paging.c.

> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -100,6 +100,14 @@ unsigned int p2m_get_allocation(struct domain *d)
>      return ROUNDUP(nr_pages, 1 << (20 - PAGE_SHIFT)) >> (20 - PAGE_SHIFT);
>  }
>  
> +/* Return the size of the pool, in bytes. */
> +int arch_get_p2m_mempool_size(struct domain *d, uint64_t *size)
> +{
> +    *size = ACCESS_ONCE(d->arch.paging.p2m_total_pages) << PAGE_SHIFT;

This may overflow for Arm32.

> @@ -157,6 +165,25 @@ int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
>      return 0;
>  }
>  
> +int arch_set_p2m_mempool_size(struct domain *d, uint64_t size)
> +{
> +    unsigned long pages = size >> PAGE_SHIFT;
> +    bool preempted = false;
> +    int rc;
> +
> +    if ( (size & ~PAGE_MASK) ||          /* Non page-sized request? */
> +         pages != (size >> PAGE_SHIFT) ) /* 32-bit overflow? */
> +        return -EINVAL;

Simply "(pages << PAGE_SHIFT) != size"? And then move the check into
common code?

> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -345,6 +345,16 @@ unsigned int hap_get_allocation(struct domain *d)
>              + ((pg & ((1 << (20 - PAGE_SHIFT)) - 1)) ? 1 : 0));
>  }
>  
> +int hap_get_allocation_bytes(struct domain *d, uint64_t *size)
> +{
> +    unsigned long pages = (d->arch.paging.hap.total_pages +
> +                           d->arch.paging.hap.p2m_pages);

Unlike for Arm no ACCESS_ONCE() here? Also the addition can in
principle overflow, because being done only in 32 bits.

> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -977,6 +977,45 @@ int __init paging_set_allocation(struct domain *d, unsigned int pages,
>  }
>  #endif
>  
> +int arch_get_p2m_mempool_size(struct domain *d, uint64_t *size)
> +{
> +    int rc;
> +
> +    if ( is_pv_domain(d) )
> +        return -EOPNOTSUPP;
> +
> +    if ( hap_enabled(d) )
> +        rc = hap_get_allocation_bytes(d, size);
> +    else
> +        rc = shadow_get_allocation_bytes(d, size);
> +
> +    return rc;
> +}
> +
> +int arch_set_p2m_mempool_size(struct domain *d, uint64_t size)
> +{
> +    unsigned long pages = size >> PAGE_SHIFT;
> +    bool preempted = false;
> +    int rc;
> +
> +    if ( is_pv_domain(d) )
> +        return -EOPNOTSUPP;

Why? You do say "PV is weird" in a post-commit-message remark, but why
do you want to retain this weirdness? Even if today the tool stack
doesn't set the size when enabling log-dirty mode, I'd view this as a
bug which could be addressed purely in the tool stack if this check
wasn't there.

> +    if ( size & ~PAGE_MASK )             /* Non page-sized request? */
> +        return -EINVAL;
> +
> +    ASSERT(paging_mode_enabled(d));

Not only with the PV aspect in mind - why? It looks reasonable to me
to set the pool size before enabling any paging mode.

> +    paging_lock(d);
> +    if ( hap_enabled(d) )
> +        rc = hap_set_allocation(d, pages, &preempted);
> +    else
> +        rc = shadow_set_allocation(d, pages, &preempted);

Potential truncation from the "unsigned long" -> "unsigned int"
conversions.

Jan


  reply	other threads:[~2022-10-26 13:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 10:20 [PATCH for-4.17 0/4] XSA-409 fixes Andrew Cooper
2022-10-26 10:20 ` [PATCH 1/4] xen: Introduce non-broken hypercalls for the p2m pool size Andrew Cooper
2022-10-26 13:42   ` Jan Beulich [this message]
2022-10-26 19:22     ` Andrew Cooper
2022-10-26 21:24       ` Julien Grall
2022-10-27  6:56         ` Jan Beulich
2022-10-27  9:27           ` Julien Grall
2022-10-27  7:11       ` Jan Beulich
2022-10-28 15:27         ` George Dunlap
2022-10-31  9:26           ` Jan Beulich
2022-10-31 10:12             ` George Dunlap
2022-11-16  1:19           ` Stefano Stabellini
2022-11-16  8:26             ` Jan Beulich
2022-10-27  7:42   ` Jan Beulich
2022-10-26 10:20 ` [PATCH 2/4] tools/tests: Unit test for " Andrew Cooper
2022-10-26 14:24   ` Jan Beulich
2022-10-26 14:35     ` Andrew Cooper
2022-10-26 10:20 ` [PATCH 3/4] xen/arm, libxl: Revert XEN_DOMCTL_shadow_op; use p2m mempool hypercalls Andrew Cooper
2022-10-26 13:22   ` Jason Andryuk
2022-10-26 13:25     ` Andrew Cooper
2022-11-16  1:37   ` Stefano Stabellini
2022-11-16  1:48     ` Andrew Cooper
2022-11-16  2:00       ` Stefano Stabellini
2022-11-16  2:39         ` Henry Wang
2022-11-16  8:30         ` Jan Beulich
2022-11-16 23:41           ` Andrew Cooper
2022-11-16 23:44             ` Stefano Stabellini
2022-11-16 23:51               ` Julien Grall
2022-11-16 23:56                 ` Stefano Stabellini
2022-11-17  8:18             ` Jan Beulich
2022-10-26 10:20 ` [PATCH 4/4] xen/arm: Correct the p2m pool size calculations Andrew Cooper
2022-11-11 10:11   ` Henry Wang
2022-11-11 10:54     ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffb8bdb8-f54b-2107-ce1a-775337c172ac@suse.com \
    --to=jbeulich@suse.com \
    --cc=Henry.Wang@arm.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@citrix.com \
    --cc=bertrand.marquis@arm.com \
    --cc=julien@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.