All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Michał Leszczyński" <michal.leszczynski@cert.pl>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	"Anthony PERARD" <anthony.perard@citrix.com>,
	"Tamas K Lengyel" <tamas@tklengyel.com>,
	Xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH v8 08/16] xen/domain: Add vmtrace_size domain creation parameter
Date: Mon, 1 Feb 2021 14:22:56 +0000	[thread overview]
Message-ID: <f54dec0a-65b6-07bf-9de8-ed96ffd8d791@citrix.com> (raw)
In-Reply-To: <3cf886f6-db7f-ccc1-5ef0-6fd8ccb38caf@suse.com>

On 01/02/2021 13:18, Jan Beulich wrote:
> On 30.01.2021 03:58, Andrew Cooper wrote:
>> +static int vmtrace_alloc_buffer(struct vcpu *v)
>> +{
>> +    struct domain *d = v->domain;
>> +    struct page_info *pg;
>> +    unsigned int i;
>> +
>> +    if ( !d->vmtrace_size )
>> +        return 0;
>> +
>> +    pg = alloc_domheap_pages(d, get_order_from_bytes(d->vmtrace_size),
>> +                             MEMF_no_refcount);
>> +    if ( !pg )
>> +        return -ENOMEM;
>> +
>> +    /*
>> +     * Getting the reference counting correct here is hard.
>> +     *
>> +     * All pages are now on the domlist.  They, or subranges within, will be
> "domlist" is too imprecise, as there's no list with this name. It's
> extra_page_list in this case (see also below).
>
>> +     * freed when their reference count drops to zero, which may any time
>> +     * between now and the domain teardown path.
>> +     */
>> +
>> +    for ( i = 0; i < (d->vmtrace_size >> PAGE_SHIFT); i++ )
>> +        if ( unlikely(!get_page_and_type(&pg[i], d, PGT_writable_page)) )
>> +            goto refcnt_err;
>> +
>> +    /*
>> +     * We must only let vmtrace_free_buffer() take any action in the success
>> +     * case when we've taken all the refs it intends to drop.
>> +     */
>> +    v->vmtrace.pg = pg;
>> +
>> +    return 0;
>> +
>> + refcnt_err:
>> +    /*
>> +     * In the failure case, we must drop all the acquired typerefs thus far,
>> +     * skip vmtrace_free_buffer(), and leave domain_relinquish_resources() to
>> +     * drop the alloc refs on any remaining pages - some pages could already
>> +     * have been freed behind our backs.
>> +     */
>> +    while ( i-- )
>> +        put_page_and_type(&pg[i]);
>> +
>> +    return -ENODATA;
>> +}
> As said in reply on the other thread, PGC_extra pages don't get
> freed automatically. I too initially thought they would, but
> (re-)learned otherwise when trying to repro your claims on that
> other thread. For all pages you've managed to get the writable
> ref, freeing is easily done by prefixing the loop body above by
> put_page_alloc_ref(). For all other pages best you can do (I
> think; see the debugging patches I had sent on that other
> thread) is to try get_page() - if it succeeds, calling
> put_page_alloc_ref() is allowed. Otherwise we can only leak the
> respective page (unless going to further extents with trying to
> recover from the "impossible"), or assume the failure here was
> because it did get freed already.

Right - I'm going to insist on breaking apart orthogonal issues.

This refcounting issue isn't introduced by this series - this series
uses an established pattern, in which we've found a corner case.

The corner case is theoretical, not practical - it is not possible for a
malicious PV domain to take 2^43 refs on any of the pages in this
allocation.  Doing so would require an hours-long SMI, or equivalent,
and even then all malicious activity would be paused after 1s for the
time calibration rendezvous which would livelock the system until the
watchdog kicked in.


I will drop the comments, because in light of this discovery, they're
not correct.

We should fix the corner case, but that should be a separate patch. 
Whatever we do needs to start by writing down the the refcounting rules
first because its totally clear that noone understands them, and then a
change to all examples of this pattern adjusting (if necessary).

~Andrew


  reply	other threads:[~2021-02-01 14:23 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-30  2:58 [PATCH v8 00/16] acquire_resource size and external IPT monitoring Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 01/16] xen/memory: Reject out-of-range resource 'frame' values Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 02/16] xen/gnttab: Rework resource acquisition Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 03/16] xen/memory: Fix acquire_resource size semantics Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 04/16] xen/memory: Improve compat XENMEM_acquire_resource handling Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 05/16] xen/memory: Indent part of acquire_resource() Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 06/16] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource Andrew Cooper
2021-02-01 10:10   ` Roger Pau Monné
2021-02-01 11:11     ` Andrew Cooper
2021-02-01 12:07       ` Roger Pau Monné
2021-02-01 12:10         ` Andrew Cooper
2021-02-01 13:03       ` Jan Beulich
2021-02-01 14:04         ` Andrew Cooper
2021-02-01 14:32           ` Jan Beulich
2021-01-30  2:58 ` [PATCH v8 07/16] xen+tools: Introduce XEN_SYSCTL_PHYSCAP_vmtrace Andrew Cooper
2021-02-01 10:32   ` Roger Pau Monné
2021-01-30  2:58 ` [PATCH v8 08/16] xen/domain: Add vmtrace_size domain creation parameter Andrew Cooper
2021-02-01 10:51   ` Roger Pau Monné
2021-02-01 11:19     ` Andrew Cooper
2021-02-01 13:18   ` Jan Beulich
2021-02-01 14:22     ` Andrew Cooper [this message]
2021-02-01 14:36       ` Jan Beulich
2021-02-01 22:14         ` Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 09/16] tools/[lib]xl: Add vmtrace_buf_size parameter Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 10/16] xen/memory: Add a vmtrace_buf resource type Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 11/16] x86/vmx: Add Intel Processor Trace support Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 12/16] xen/domctl: Add XEN_DOMCTL_vmtrace_op Andrew Cooper
2021-02-01 12:01   ` Roger Pau Monné
2021-02-01 13:00     ` Andrew Cooper
2021-02-01 14:27       ` Roger Pau Monné
2021-01-30  2:58 ` [PATCH v8 13/16] tools/libxc: Add xc_vmtrace_* functions Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 14/16] tools/misc: Add xen-vmtrace tool Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 15/16] xen/vmtrace: support for VM forks Andrew Cooper
2021-01-30  2:58 ` [PATCH v8 16/16] x86/vm_event: Carry the vmtrace buffer position in vm_event Andrew Cooper
2021-02-01  9:51   ` Jan Beulich
2021-02-01 12:34 ` [PATCH v8 00/16] acquire_resource size and external IPT monitoring Oleksandr
2021-02-01 13:07   ` Andrew Cooper
2021-02-01 13:47     ` Oleksandr
2021-02-01 14:00       ` Andrew Cooper
2021-02-02 12:09         ` Oleksandr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f54dec0a-65b6-07bf-9de8-ed96ffd8d791@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=anthony.perard@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=michal.leszczynski@cert.pl \
    --cc=roger.pau@citrix.com \
    --cc=tamas@tklengyel.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.