All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christopher Clark <christopher.w.clark@gmail.com>
To: "Roger Pau Monné" <royger@freebsd.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	Ross Philipson <ross.philipson@gmail.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Jason Andryuk <jandryuk@gmail.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Julien Grall <julien.grall@arm.com>, Tim Deegan <tim@xen.org>,
	Daniel Smith <dpsmith@apertussolutions.com>,
	Rich Persaud <persaur@gmail.com>,
	Paul Durrant <paul.durrant@citrix.com>,
	Jan Beulich <jbeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	James McKenzie <james@bromium.com>,
	Eric Chanudet <eric.chanudet@gmail.com>,
	Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
Date: Wed, 9 Jan 2019 19:09:43 -0800	[thread overview]
Message-ID: <CACMJ4GaAW+V9JFNBP=O7ae_fsnJN-SzZU_t=Ry1rgPDkx4NpXw@mail.gmail.com> (raw)
In-Reply-To: <CAPLaKK4_Uw=g8LtkKhfCG3uTKEiCuJgcp9M_amWgch-N4+zSng@mail.gmail.com>

Thanks for the review, Roger. Replies inline below.

On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné <royger@freebsd.org> wrote:
>
>  to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > sendv operation is invoked to perform a synchronous send of buffers
> > contained in iovs to a remote domain's registered ring.
> >
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 59ce8c4..4548435 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c

> >
> > +static int
> > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > +                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > +                     uint32_t len)
> > +{
> > +    unsigned int mfns_index = offset >> PAGE_SHIFT;
> > +    void *dst;
> > +    int ret;
> > +    unsigned int src_offset = 0;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    offset &= ~PAGE_MASK;
> > +
> > +    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
> > +        return -EFAULT;
> > +
> > +    while ( (offset + len) > PAGE_SIZE )
>
> I think you could map the whole ring in contiguous virtual address
> space and then writing to it would be much more easy, you wouldn't
> need to iterate with memcpy or copy_from_guest, take a look at __vmap.
> You could likely map this when the ring gets setup and keep it mapped
> for the lifetime of the ring.

You're right about that, because map_domain_page_global, which the
current code uses, uses vmap itself. I think there's a couple of
reasons why the code has been implemented the iterative way though.

The first is that I think ring resize has been a consideration: it's
useful to be able to increase the size of a live and active ring that
is under load without having to tear down the mappings, find a new
virtual address region of the right size and then remap it: you can
just supply some more memory and map those pages onto the end of the
ring, and ensure both sides know about the new ring size. Similarly,
shrinking a quiet ring can be useful.
However, the "gfn race" that you (correctly) pointed out in an earlier
review, and Jan's related request to drop the "revalidate an existing
mapping on ring reregister" motivated removal of a section of the code
involved, and then in v3 of the series, I've actually just blocked
ring resize because it's missing a walk through the pending
notifications to find any that have become untriggerable with the new
ring size when a ring is shrunk and I'd like to defer implementing
that for now. So the ring resize reason is more of a consideration for
a possible later version of Argo than the current one.

The second reason is about avoiding exposing the Xen virtual memory
allocator directly to frequent guest-supplied size requests for
contiguous regions (of up to 16GB). With single-page allocations to
build a ring, fragmentation is not a problem, and mischief by a guest
seems difficult. Changing it to issue requests for contiguous regions,
with variable ring sizes up to the maximum of 16GB, it seems like
significant fragmentation may be achievable. I don't know the
practical impact of that but it seems worth avoiding. Are the other
users of __vmap (or vmap) for multi-gigabyte regions only either
boot-time, infrequent operations (livepatch), or for actions by
privileged (ie. somewhat trusted) domains (ioremap), or is it already
a frequent operation somewhere else?

Given the context above, and Jason's simplification to the
memcpy_to_guest_ring function, plus the imminent merge freeze
deadline, and the understanding that this loop and the related data
structures supporting it have been tested and are working, would it be
acceptable to omit making this contiguous mapping change from this
current series?

>
> > +    {
> > +        unsigned int head_len = PAGE_SIZE - offset;
> > +
> > +        ret = ring_map_page(ring_info, mfns_index, &dst);
> > +        if ( ret )
> > +            return ret;
> > +
> > +        if ( src )
> > +        {
> > +            memcpy(dst + offset, src + src_offset, head_len);
> > +            src_offset += head_len;
> > +        }
> > +        else
> > +        {
> > +            ret = copy_from_guest(dst + offset, src_hnd, head_len) ?
> > +                    -EFAULT : 0;
> > +            if ( ret )
> > +                return ret;
>
> You can simplify this to:
>
> if ( copy_from_guest(...) )
>     return -EFAULT;

yes! ack - thanks

<snip>
> > +/*
> > + * get_sanitized_ring creates a modified copy of the ring pointers where
> > + * the rx_ptr is rounded up to ensure it is aligned, and then ring
> > + * wrap is handled. Simplifies safe use of the rx_ptr for available
> > + * space calculation.
> > + */
> > +static int
> > +get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
> > +{
> > +    uint32_t rx_ptr;
> > +    int ret;
> > +
> > +    ret = get_rx_ptr(ring_info, &rx_ptr);
> > +    if ( ret )
> > +        return ret;
> > +
> > +    ring->tx_ptr = ring_info->tx_ptr;
> > +
> > +    rx_ptr = ROUNDUP_MESSAGE(rx_ptr);
> > +    if ( rx_ptr >= ring_info->len )
> > +        rx_ptr = 0;
> > +
> > +    ring->rx_ptr = rx_ptr;
>
> Newline.

ack, thanks

<snip>
> > +/*
> > + * iov_count returns its count on success via an out variable to avoid
> > + * potential for a negative return value to be used incorrectly
> > + * (eg. coerced into an unsigned variable resulting in a large incorrect value)
> > + */
> > +static int
> > +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
> > +{
> > +    uint32_t sum_iov_lens = 0;
> > +
> > +    if ( niov > XEN_ARGO_MAXIOV )
> > +        return -EINVAL;
> > +
> > +    while ( niov-- )
>
> I would use a for loop here, that would remove the need to piov++, if
> you want to keep it quite similar:
>
> for ( ; niov--; piov++ )
> {

Yes, that is better - thanks, applied.

<snip>
> > +
> > +static int
> > +ringbuf_insert(struct domain *d, struct argo_ring_info *ring_info,
> > +               const struct argo_ring_id *src_id,
> > +               XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd,
> > +               unsigned long niov, uint32_t message_type,
> > +               unsigned long *out_len)
> > +{
> > +    xen_argo_ring_t ring;
> > +    struct xen_argo_ring_message_header mh = { 0 };
>
> No need for the 0, { } will achieve exactly the same.

ack, applied

>
> > +    int32_t sp;
> > +    int32_t ret;
> > +    uint32_t len = 0;
> > +    xen_argo_iov_t iovs[XEN_ARGO_MAXIOV];
>
> This seems slightly dangerous, a change of the maximum could cause
> stack overflow depending on the size of xen_argo_iov_t. I think you
> need some comment next to definition of XEN_ARGO_MAXIOV to note that
> increasing this could cause issues.

That makes sense, will do.

<snip>
> > +    /*
> > +     * First data write into the destination ring: fixed size, message header.
> > +     * This cannot overrun because the available free space (value in 'sp')
> > +     * is checked above and must be at least this size.
> > +     */
> > +    ret = memcpy_to_guest_ring(ring_info, ring.tx_ptr + sizeof(xen_argo_ring_t),
> > +                               &mh, NULL_hnd, sizeof(mh));
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: failed to write message header to ring (vm%u:%x vm%d)\n",
> > +                ring_info->id.domain_id, ring_info->id.port,
> > +                ring_info->id.partner_id);
> > +
> > +        goto out;
> > +    }
> > +
> > +    ring.tx_ptr += sizeof(mh);
> > +    if ( ring.tx_ptr == ring_info->len )
> > +        ring.tx_ptr = 0;
> > +
> > +    piov = iovs;
> > +
> > +    while ( niov-- )
>
> AFAICT using a for loop would remove the need to also do a piov++ at
> each iteration.

ack, applied.

<snip>
> > +         * Case 2: ring-tail-wrap-write above was not performed
> > +         *    -> so iov_len is the guest-supplied value and: (iov_len <= sp)
> > +         *    ie. less than available space at the tail of the ring:
> > +         *        so this write cannot overrun.
> > +         */
> > +        ret = memcpy_to_guest_ring(ring_info,
> > +                                   ring.tx_ptr + sizeof(xen_argo_ring_t),
> > +                                   NULL, buf_hnd, iov_len);
> > +        if ( ret )
> > +        {
> > +            gprintk(XENLOG_ERR,
> > +                    "argo: failed to copy [%p, %"PRIx32"] (vm%u:%x vm%d)\n",
> > +                    buf_hnd.p, iov_len, ring_info->id.domain_id,
> > +                    ring_info->id.port, ring_info->id.partner_id);
> > +
> > +            goto out;
> > +        }
> > +
> > +        ring.tx_ptr += iov_len;
> > +
> > +        if ( ring.tx_ptr == ring_info->len )
> > +            ring.tx_ptr = 0;
> > +
> > +        piov++;
> > +    }
> > +
> > +    ring.tx_ptr = ROUNDUP_MESSAGE(ring.tx_ptr);
> > +
> > +    if ( ring.tx_ptr >= ring_info->len )
> > +        ring.tx_ptr -= ring_info->len;
> > +
> > +    update_tx_ptr(ring_info, ring.tx_ptr);
> > +
> > + out:
>
> Do you really need to out label? *out_len it's only set in the success
> case, so all the error cases that use a 'goto out' could be replaced
> by 'return ret;'.

ack, thanks -- done.

<snip>
> > +static int
> > +pending_queue(struct argo_ring_info *ring_info, domid_t src_id,
> > +              unsigned int len)
> > +{
> > +    struct pending_ent *ent;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    if ( ring_info->npending >= MAX_PENDING_PER_RING )
> > +        return -ENOSPC;
> > +
> > +    ent = xmalloc(struct pending_ent);
> > +
>
> Extra newline.

ack

<snip>
> >
> > +static long
> > +sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
> > +      const xen_argo_addr_t *dst_addr,
> > +      XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd, unsigned long niov,
> > +      uint32_t message_type)
> > +{
> > +    struct domain *dst_d = NULL;
> > +    struct argo_ring_id src_id;
> > +    struct argo_ring_info *ring_info;
> > +    int ret = 0;
> > +    unsigned long len = 0;
> > +
> > +    ASSERT(src_d->domain_id == src_addr->domain_id);
> > +
> > +    argo_dprintk("sendv: (%d:%x)->(%d:%x) niov:%lu iov:%p type:%u\n",
> > +                 src_addr->domain_id, src_addr->port,
> > +                 dst_addr->domain_id, dst_addr->port,
> > +                 niov, iovs_hnd.p, message_type);
> > +
> > +    read_lock(&argo_lock);
> > +
> > +    if ( !src_d->argo )
> > +    {
> > +        ret = -ENODEV;
> > +        goto out_unlock;
> > +    }
> > +
> > +    src_id.port = src_addr->port;
> > +    src_id.domain_id = src_d->domain_id;
> > +    src_id.partner_id = dst_addr->domain_id;
> > +
> > +    dst_d = get_domain_by_id(dst_addr->domain_id);
> > +    if ( !dst_d )
> > +    {
> > +        argo_dprintk("!dst_d, ESRCH\n");
> > +        ret = -ESRCH;
> > +        goto out_unlock;
> > +    }
> > +
> > +    if ( !dst_d->argo )
> > +    {
> > +        argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > +        ret = -ECONNREFUSED;
> > +        goto out_unlock;
>
> The usage of out_unlock here and in the condition above is wrong,
> since it will unconditionally call read_unlock(&argo_lock); which is
> wrong here because the lock has not yet been acquired.

Sorry, I don't think that's quite right -- if you scroll up a bit
here, you can see where argo_lock is taken unconditionally, just after
the dprintk and before checking whether src_d is argo enabled. The
second lock hasn't been taken yet - but that's not the one being
unlocked on that out_unlock path.

>
> > +    }
> > +
> > +    read_lock(&dst_d->argo->lock);
> > +
> > +    ring_info = ring_find_info_by_match(dst_d, dst_addr->port,
> > +                                        src_addr->domain_id);
> > +    if ( !ring_info )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u connection refused, src (vm%u:%x) dst (vm%u:%x)\n",
> > +                current->domain->domain_id, src_id.domain_id, src_id.port,
> > +                dst_addr->domain_id, dst_addr->port);
> > +
> > +        ret = -ECONNREFUSED;
> > +        goto out_unlock2;
> > +    }
> > +
> > +    spin_lock(&ring_info->lock);
> > +
> > +    ret = ringbuf_insert(dst_d, ring_info, &src_id, iovs_hnd, niov,
> > +                         message_type, &len);
> > +    if ( ret == -EAGAIN )
> > +    {
> > +        argo_dprintk("argo_ringbuf_sendv failed, EAGAIN\n");
> > +        /* requeue to issue a notification when space is there */
> > +        ret = pending_requeue(ring_info, src_addr->domain_id, len);
> > +    }
> > +
> > +    spin_unlock(&ring_info->lock);
> > +
> > +    if ( ret >= 0 )
> > +        signal_domain(dst_d);
> > +
> > + out_unlock2:
>
> There's only a single user of the out_unlock2 label, at which point it
> might be easier to read to just put the read_unlock there and just use
> the existing out_unlock label.

ack, will change that.

Thanks again,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-01-10  3:10 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
2019-01-07  7:42 ` [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
2019-01-08 15:46   ` Jan Beulich
2019-01-07  7:42 ` [PATCH v3 02/15] argo: introduce the argo_op hypercall boilerplate Christopher Clark
2019-01-07  7:42 ` [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging Christopher Clark
2019-01-08 15:50   ` Jan Beulich
2019-01-10  9:28   ` Roger Pau Monné
2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
2019-01-08 22:08   ` Ross Philipson
2019-01-08 22:23     ` Christopher Clark
2019-01-08 22:54   ` Jason Andryuk
2019-01-09  6:48     ` Christopher Clark
2019-01-09 14:15       ` Jason Andryuk
2019-01-09 23:24         ` Christopher Clark
2019-01-09  9:35     ` Jan Beulich
2019-01-09 14:26       ` Jason Andryuk
2019-01-09 14:38         ` Jan Beulich
2019-01-10 23:29           ` Christopher Clark
2019-01-10 10:19   ` Roger Pau Monné
2019-01-10 11:52     ` Jan Beulich
2019-01-10 12:26       ` Roger Pau Monné
2019-01-10 12:46         ` Jan Beulich
2019-01-11  6:03     ` Christopher Clark
2019-01-11  9:27       ` Roger Pau Monné
2019-01-14  8:32         ` Christopher Clark
2019-01-14 11:32           ` Roger Pau Monné
2019-01-14 14:28             ` Rich Persaud
2019-01-10 16:16   ` Eric Chanudet
2019-01-11  6:05     ` Christopher Clark
2019-01-11 11:54   ` Jan Beulich
2019-01-14  8:33     ` Christopher Clark
2019-01-14 14:46   ` Wei Liu
2019-01-14 15:29     ` Lars Kurth
2019-01-14 18:16     ` Christopher Clark
2019-01-14 19:42     ` Roger Pau Monné
2019-02-04 20:56     ` Christopher Clark
2019-02-05 10:32       ` Wei Liu
2019-01-14 14:58   ` Andrew Cooper
2019-01-14 15:12     ` Jan Beulich
2019-01-15  7:24       ` Christopher Clark
2019-01-15  7:21     ` Christopher Clark
2019-01-15  9:01       ` Jan Beulich
2019-01-15  9:06         ` Andrew Cooper
2019-01-15  9:17           ` Jan Beulich
2019-01-07  7:42 ` [PATCH v3 05/15] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
2019-01-07  7:42 ` [PATCH v3 06/15] xen/arm: introduce guest_handle_for_field() Christopher Clark
2019-01-08 22:03   ` Stefano Stabellini
2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
2019-01-09 15:55   ` Wei Liu
2019-01-09 16:00     ` Christopher Clark
2019-01-09 17:02       ` Julien Grall
2019-01-09 17:18         ` Stefano Stabellini
2019-01-09 18:13           ` Julien Grall
2019-01-09 20:33             ` Christopher Clark
2019-01-09 17:54         ` Wei Liu
2019-01-09 18:28           ` Julien Grall
2019-01-09 20:38             ` Christopher Clark
2019-01-10 11:24   ` Roger Pau Monné
2019-01-10 11:57     ` Jan Beulich
2019-01-11  6:30       ` Christopher Clark
2019-01-11  6:29     ` Christopher Clark
2019-01-11  9:38       ` Roger Pau Monné
2019-01-10 20:11   ` Eric Chanudet
2019-01-11  6:09     ` Christopher Clark
2019-01-14 14:19   ` Jan Beulich
2019-01-15  7:56     ` Christopher Clark
2019-01-15  8:36       ` Jan Beulich
2019-01-15  8:46         ` Christopher Clark
2019-01-14 15:31   ` Andrew Cooper
2019-01-15  8:02     ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 08/15] argo: implement the unregister op Christopher Clark
2019-01-10 11:40   ` Roger Pau Monné
2019-01-15  8:05     ` Christopher Clark
2019-01-14 15:06   ` Jan Beulich
2019-01-15  8:11     ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
2019-01-09 18:05   ` Jason Andryuk
2019-01-10  2:08     ` Christopher Clark
2019-01-09 18:57   ` Roger Pau Monné
2019-01-10  3:09     ` Christopher Clark [this message]
2019-01-10 12:01       ` Roger Pau Monné
2019-01-10 12:13         ` Jan Beulich
2019-01-10 12:40           ` Roger Pau Monné
2019-01-10 12:53             ` Jan Beulich
2019-01-11  6:37               ` Christopher Clark
2019-01-10 21:41   ` Eric Chanudet
2019-01-11  7:12     ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 10/15] argo: implement the notify op Christopher Clark
2019-01-10 12:21   ` Roger Pau Monné
2019-01-15  6:53     ` Christopher Clark
2019-01-15  8:06       ` Roger Pau Monné
2019-01-15  8:32         ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 11/15] xsm, argo: XSM control for argo register Christopher Clark
2019-01-07  7:42 ` [PATCH v3 12/15] xsm, argo: XSM control for argo message send operation Christopher Clark
2019-01-07  7:42 ` [PATCH v3 13/15] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
2019-01-07  7:42 ` [PATCH v3 14/15] xsm, argo: notify: don't describe rings that cannot be sent to Christopher Clark
2019-01-07  7:42 ` [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery Christopher Clark
2019-01-14 12:57   ` Jan Beulich
2019-01-17  7:22     ` Christopher Clark
2019-01-17 11:25       ` Jan Beulich
2019-01-20 21:18         ` Christopher Clark
2019-01-21 12:03           ` Jan Beulich
2019-01-22 11:08             ` Jan Beulich
2019-01-23 21:14               ` Christopher Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACMJ4GaAW+V9JFNBP=O7ae_fsnJN-SzZU_t=Ry1rgPDkx4NpXw@mail.gmail.com' \
    --to=christopher.w.clark@gmail.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dpsmith@apertussolutions.com \
    --cc=eric.chanudet@gmail.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=james@bromium.com \
    --cc=jandryuk@gmail.com \
    --cc=jbeulich@suse.com \
    --cc=julien.grall@arm.com \
    --cc=konrad.wilk@oracle.com \
    --cc=paul.durrant@citrix.com \
    --cc=persaur@gmail.com \
    --cc=roger.pau@citrix.com \
    --cc=ross.philipson@gmail.com \
    --cc=royger@freebsd.org \
    --cc=sstabellini@kernel.org \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.