Re: [PATCH (V9) 2/2] xen: Add V4V implementation

From: Ross Philipson <ross.philipson@citrix.com>
To: xen-devel@lists.xen.org
Cc: Tim Deegan <tim@xen.org>
Subject: Re: [PATCH (V9) 2/2] xen: Add V4V implementation
Date: Tue, 4 Jun 2013 14:01:39 -0400	[thread overview]
Message-ID: <51AE2B83.1070808@citrix.com> (raw)
In-Reply-To: <20130530162009.GB64390@ocelot.phlegethon.org>

On 05/30/2013 12:20 PM, Tim Deegan wrote:
> Hi,
>
> Nice to see this series again. :)  Thanks for your work cleaning it up.

Thanks, and thanks for your feedback.

>
> At 15:43 -0400 on 28 May (1369755811), Ross Philipson wrote:
>> Setup of v4v domains a domain gets created and cleanup
>> when a domain die. Wire up the v4v hypercall.
>>
>> Include v4v internal and public headers.
>>
>> Signed-off-by: Ross Philipson<ross.philipson@citrix.com>
>
> I'm a bit worried about resource exhaustion attacks in this interface.
> In particular:
>   - It looks like a malicious guest can register large numbers of rings
>     (like, 2^48 of them), each of which causes Xen to xmalloc bookkeeping
>     space (up to 2^22 bytes per ring).
>   - A malicious guest that has enough memory can send a very large vector
>     (2^31 1-byte entries), which would cause Xen to read and check 2^37
>     bytes of control structures before even starting the copy of the data.
>     At a rough guess that could be>10seconds without checking for preemption.
>     Maybe reduce the limits a bit (say, 32MB per hypercall, may 100 elements
>     per vector)?
>   - Something similar for V4VOP_notify, I guess, though that could perhaps
>     be handled by the usual pending-events check + hypercall continuations.

Yes we agree. I think the plan is to have some reasonable default limits 
and allow the values to be adjusted during domain creation if that is 
needed. This would be done through a new V4V op for the hypercall.

>
> And now, some detailed review:
>
>> +/*
>> + * Messages on the ring are padded to 128 bits
>> + * Len here refers to the exact length of the data not including the
>> + * 128 bit header. The message uses
>> + * ((len +0xf)&  ~0xf) + sizeof(v4v_ring_message_header) bytes
>> + */
>> +#define V4V_ROUNDUP(a) (((a) +0xf )&  ~0xf)
>
> Even though this is always used on values<2^32 and so is safe,
> it would be nice to have it as (((a) + 0xf)&  ~(typeof (a))0xf)
> so it's clear that it doesn't truncate any high-order bits.

Ack

>
> [...]
>> +/*
>> + * lock to protect the filtering rules list: v4vtable_rules
>> + *
>> + * The write lock is held for viptables_del and viptables_add
>> + * The read lock is held for viptable_list
>
> s/viptables/v4vtables/g/

For all the minor ones I don't address directly, just assume I will fix 
it per your suggestions.

>
> [...]
>> +static int
>> +v4v_ringbuf_get_rx_ptr(struct domain *d, struct v4v_ring_info *ring_info,
>> +                        uint32_t * rx_ptr)
>> +{
>> +    v4v_ring_t *ringp;
>> +
>> +    if ( ring_info->npage == 0 )
>> +        return -1;
>> +
>> +    ringp = map_domain_page(mfn_x(ring_info->mfns[0]));
>> +
>> +    v4v_dprintk("v4v_ringbuf_payload_space: mapped %p to %p\n",
>> +                (void *)mfn_x(ring_info->mfns[0]), ringp);
>
> The function name is wrong there, presumably after this code was
> refctored.  You might add %s/__func__ to the definition of v4v_dprintk()
> instead?
>
>> +    if ( !ringp )
>> +        return -1;
>> +
>> +    write_atomic(rx_ptr, ringp->rx_ptr);
>
> AIUI ringp->rx_ptr is the shared variable here, so this should be
>    *rx_ptr = read_atomic(ringp->rx_ptr);
>
> Also, maybe comment in the hypercall interface that the guest should
> also use atomic operations when accessing rx_ptr and tx_ptr.  (Though I
> should say the hypercall comments are already pretty good compared to
> some things we already have!)

What you say sounds right. I will address this.

>
>> +    mb();
>
> I'm not 100% convinced this barrier is needed (since we're not going to
> read the 'free' memory in the ring and I don't think that writes to it
> can possibly be hoisted past this read since they're gated on the
> result).  OTOH I'm happy to err on the side of caution. :)

This could be cargo. There have been quite a number of modifications to 
this code since its birth in our product 3+ years ago. But if I cannot 
be sure, I will probably leave it be.

>
>> +    unmap_domain_page(ringp);
>> +    return 0;
>> +}
>> +
>> +uint32_t
>> +v4v_ringbuf_payload_space(struct domain * d, struct v4v_ring_info * ring_info)
>> +{
>> +    v4v_ring_t ring;
>> +    int32_t ret;
>> +
>> +    ring.tx_ptr = ring_info->tx_ptr;
>> +    ring.len = ring_info->len;
>> +
>> +    if ( v4v_ringbuf_get_rx_ptr(d, ring_info,&ring.rx_ptr) )
>> +        return 0;
>> +
>> +    v4v_dprintk("v4v_ringbuf_payload_space:tx_ptr=%d rx_ptr=%d\n",
>> +                (int)ring.tx_ptr, (int)ring.rx_ptr);
>> +    if ( ring.rx_ptr == ring.tx_ptr )
>> +        return ring.len - sizeof(struct v4v_ring_message_header);
>> +
>> +    ret = ring.rx_ptr - ring.tx_ptr;
>> +    if ( ret<  0 )
>> +        ret += ring.len;
>> +
>> +    ret -= sizeof(struct v4v_ring_message_header);
>> +    ret -= V4V_ROUNDUP(1);
>> +
>> +    return (ret<  0) ? 0 : ret;
>
> Sanity check for ret>  ring.len as well?  I guess it's not clear what
> you'd do about that.
>
> [...]
>> +static long
>> +v4v_ringbuf_insertv(struct domain *d,
>> +                    struct v4v_ring_info *ring_info,
>> +                    v4v_ring_id_t *src_id, uint32_t message_type,
>> +                    XEN_GUEST_HANDLE_PARAM(v4v_iov_t) iovs,
>> +                    uint32_t niov, size_t len)
>> +{
>> +    v4v_ring_t ring;
>> +    struct v4v_ring_message_header mh = { 0 };
>> +    int32_t sp;
>> +    long happy_ret;
>> +    int32_t ret = 0;
>> +    XEN_GUEST_HANDLE(uint8_t) empty_hnd = { 0 };
>> +
>> +    ASSERT(spin_is_locked(&ring_info->lock));
>> +
>> +    happy_ret = len;
>> +
>> +    if ( (V4V_ROUNDUP(len) + sizeof(struct v4v_ring_message_header) )>=
>> +            ring_info->len)
>> +        return -EMSGSIZE;
>> +
>> +    do {
>> +        if ( (ret = v4v_memcpy_from_guest_ring(&ring, ring_info, 0,
>> +                                               sizeof(ring))) )
>> +            break;
>
> Should this be using an atomic operation to read ring.rx_ptr instead?
> That's the only field we actually use from the guest copy anyway.

Probably. I will look at this as well as other uses of the t/rx_ptr 
usage mentioned above.

>
>> +
>> +        ring.tx_ptr = ring_info->tx_ptr;
>> +        ring.len = ring_info->len;
>> +
>> +        v4v_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d ring_info->tx_ptr=%d\n",
>> +                    ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
>> +
>> +        if ( ring.rx_ptr == ring.tx_ptr )
>> +            sp = ring_info->len;
>> +        else
>> +        {
>> +            sp = ring.rx_ptr - ring.tx_ptr;
>> +            if ( sp<  0 )
>> +                sp += ring.len;
>> +        }
>> +
>> +        if ( (V4V_ROUNDUP(len) + sizeof(struct v4v_ring_message_header))>= sp )
>> +        {
>> +            v4v_dprintk("EAGAIN\n");
>> +            ret = -EAGAIN;
>> +            break;
>> +        }
>> +
>> +        mh.len = len + sizeof (struct v4v_ring_message_header);
>> +        mh.source.port = src_id->addr.port;
>> +        mh.source.domain = src_id->addr.domain;
>> +        mh.message_type = message_type;
>> +
>> +        if ( (ret = v4v_memcpy_to_guest_ring(ring_info,
>> +                                             ring.tx_ptr + sizeof(v4v_ring_t),
>> +&mh, empty_hnd,
>> +                                             sizeof(mh))) )
>> +            break;
>
> OK, I think this is correct, because the tx_ptr is always 16-byte
> aligned and the message header is 16 bytes long, but it's not
> obvious. :) Maybe add a comment here, and a BUILD_BUG_ON() to assert
> that sizeof (struct v4v_ring_message_header)<= V4V_ROUNDUP(1)?

Ack

>
>> +        ring.tx_ptr += sizeof(mh);
>> +        if ( ring.tx_ptr == ring_info->len )
>> +            ring.tx_ptr = 0;
>> +
>> +        while ( niov-- )
>> +        {
>> +            XEN_GUEST_HANDLE_PARAM(uint8_t) bufp_hnd;
>> +            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
>> +            v4v_iov_t iov;
>> +
>> +            if ( copy_from_guest(&iov, iovs, 1) )
>
> Hrmn.  So here we re-copy the iov from the guest, but the guest might
> have changed it since we looked at it last (when we calculated the total
> length).  That means we need to keep a running count of how much we copy,
> to be sure the guest doesn't swap short iovs for long ones and make
> us overrun the ring's rx_ptr.

Yea I was starting to have concerns about just this right after I posted 
patch set 9. I really don't like that we read it twice. Your suggestion 
sounds reasonable.

>
>> +            {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            bufp_hnd = guest_handle_from_ptr(iov.iov_base, uint8_t);
>> +            buf_hnd = guest_handle_from_param(bufp_hnd, uint8_t);
>> +            len = iov.iov_len;
>> +
>> +            if ( unlikely(!guest_handle_okay(buf_hnd, len)) )
>> +            {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            sp = ring.len - ring.tx_ptr;
>> +
>> +            if ( len>  sp )
>> +            {
>> +                ret = v4v_memcpy_to_guest_ring(ring_info,
>> +                        ring.tx_ptr + sizeof(v4v_ring_t),
>> +                        NULL, buf_hnd, sp);
>> +                if ( ret )
>> +                    break;
>> +
>> +                ring.tx_ptr = 0;
>> +                len -= sp;
>> +                guest_handle_add_offset(buf_hnd, sp);
>> +            }
>> +
>> +            ret = v4v_memcpy_to_guest_ring(ring_info,
>> +                    ring.tx_ptr + sizeof(v4v_ring_t),
>> +                    NULL, buf_hnd, len);
>> +            if ( ret )
>> +                break;
>> +
>> +            ring.tx_ptr += len;
>> +
>> +            if ( ring.tx_ptr == ring_info->len )
>> +                ring.tx_ptr = 0;
>> +
>> +            guest_handle_add_offset(iovs, 1);
>> +        }
>> +        if ( ret )
>> +            break;
>> +
>> +        ring.tx_ptr = V4V_ROUNDUP(ring.tx_ptr);
>> +
>> +        if ( ring.tx_ptr>= ring_info->len )
>> +            ring.tx_ptr -= ring_info->len;
>> +
>> +        mb();
>> +        ring_info->tx_ptr = ring.tx_ptr;
>> +        if ( (ret = v4v_update_tx_ptr(ring_info, ring.tx_ptr)) )
>> +            break;
>> +    } while ( 0 );
>> +
>> +    v4v_ring_unmap(ring_info);
>> +
>> +    return ret ? ret : happy_ret;
>> +}
> [...]
>> +static int
>> +v4v_pending_requeue(struct v4v_ring_info *ring_info, domid_t src_id, int len)
>> +{
>> +    struct hlist_node *node;
>> +    struct v4v_pending_ent *ent;
>> +
>> +    hlist_for_each_entry(ent, node,&ring_info->pending, node)
>> +    {
>> +        if ( ent->id == src_id )
>> +        {
>> +            if ( ent->len<  len )
>> +                ent->len = len;
>
> Does this mean that if I call V4VOP_notify with some large space
> requirement, and then again with a smaller one, I won't get a
> notification until the original large amount is free?  Seems like this
> should always overwrite the old length instead, and notify on the most
> recently requested amount.

It has been a while since I have looked closely at the queuing code so I 
will have to study this some. I believe it does the right thing but I 
will check.

>
>> +static int
>> +v4v_fill_ring_data(struct domain *src_d,
>> +                   XEN_GUEST_HANDLE(v4v_ring_data_ent_t) data_ent_hnd)
>> +{
>> +    v4v_ring_data_ent_t ent;
>> +    struct domain *dst_d;
>> +    struct v4v_ring_info *ring_info;
>> +
>> +    if ( copy_from_guest(&ent, data_ent_hnd, 1) )
>> +    {
>> +        v4v_dprintk("EFAULT\n");
>> +        return -EFAULT;
>> +    }
>> +
>> +    v4v_dprintk("v4v_fill_ring_data: ent.ring.domain=%d,ent.ring.port=%u\n",
>> +                (int)ent.ring.domain, (int)ent.ring.port);
>> +
>> +    ent.flags = 0;
>> +
>> +    dst_d = get_domain_by_id(ent.ring.domain);
>> +
>> +    if ( dst_d&&  dst_d->v4v )
>> +    {
>> +        read_lock(&dst_d->v4v->lock);
>> +        ring_info = v4v_ring_find_info_by_addr(dst_d,&ent.ring,
>> +                                               src_d->domain_id);
>> +
>
> Should there be a call to v4vtables_check() here?  Otherwise we leak a
> little bit of information to a guest who's not supposed to see the ring,
> (and also it's a bit odd to say 'yes, you may send' and then bounce the
> sendv call).

Yea I think so. Concerning all the v4vtables issues, we are now looking 
at reworking a lot of this. We just started discussing options so I 
don't have a lot to reply with yet.

>
> [...]
>> +static void v4v_ring_remove_mfns(struct domain *d, struct v4v_ring_info *ring_info)
>> +{
>> +    int i;
>> +
>> +    ASSERT(rw_is_write_locked(&d->v4v->lock));
>> +
>> +    if ( ring_info->mfns )
>> +    {
>> +        for ( i = 0; i<  ring_info->npage; ++i )
>> +            if ( mfn_x(ring_info->mfns[i]) != 0 )
>
> Here, and where the array is populated, it would be better to use
> INVALID_MFN in any empty slots.  On x86, MFN 0 is never allocated to a
> guest, but that may not always be true on all architectures.

Ack.

>
> [...]
>> +static long
>> +v4v_ring_add(struct domain *d, XEN_GUEST_HANDLE_PARAM(v4v_ring_t) ring_hnd,
>> +             uint32_t npage, XEN_GUEST_HANDLE_PARAM(v4v_pfn_t) pfn_hnd)
>> +{
> [...]
>> +        /*
>> +         * no need for a lock yet, because only we know about this
>> +         * set the tx pointer if it looks bogus (we don't reset it
>> +         * because this might be a re-register after S4)
>> +         */
>> +        if ( (ring.tx_ptr>= ring.len)
>> +                || (V4V_ROUNDUP(ring.tx_ptr) != ring.tx_ptr) )
>> +        {
>> +            ring.tx_ptr = ring.rx_ptr;
>> +        }
>
> Maybe check here that the guest-supplied rx_ptr is correctly aligned?
> I don't think an unaligned one would be a security risk, but it might
> cause entertaining breakage on the first sendv().

Ack.

>
>> +        copy_field_to_guest(ring_hnd,&ring, tx_ptr);
>> +
>> +        read_lock(&d->v4v->lock);
>> +        ring_info = v4v_ring_find_info(d,&ring.id);
>> +
>> +        if ( !ring_info )
>> +        {
>> +            read_unlock(&d->v4v->lock);
>> +            ring_info = xmalloc(struct v4v_ring_info);
>> +            if ( !ring_info )
>> +            {
>> +                v4v_dprintk("ENOMEM\n");
>> +                ret = -ENOMEM;
>> +                break;
>> +            }
>> +            need_to_insert++;
>> +            spin_lock_init(&ring_info->lock);
>> +            INIT_HLIST_HEAD(&ring_info->pending);
>> +            ring_info->mfns = NULL;
>> +        }
>> +        else
>> +        {
>> +            /*
>> +             * Ring info already existed.
>> +             */
>> +            printk(KERN_INFO "v4v: dom%d ring already registered\n",
>> +                    current->domain->domain_id);
>> +            ret = -EEXIST;
>> +            break;
>> +        }
>
> If you reshuffle these so that the already-exists case is first, then
> the not-already-exists case doesn't need to be indented, i.e.
>
>      if ( ring_info )
>      {
>          printk();
>          ret = -EEXIST;
>          break;
>      }
>      /* Normal case goes here */
>
> Then I think it becomes clear that you don't need a 'need_to_insert'
> flag -- and that the read_unlock() below is dead code, so the error path
> doesn't unlock at all!
>
>> +
>> +        spin_lock(&ring_info->lock);
>> +        ring_info->id = ring.id;
>> +        ring_info->len = ring.len;
>> +        ring_info->tx_ptr = ring.tx_ptr;
>> +        ring_info->ring = ring_hnd;
>
> I can't see where this field is used, but AFAICT the ring_hnd isn't
> guaranteed to be valid after this hypercall finishes (and certainly not
> after, say, the next context switch), so storing it doesn't seem useful.

Yes I don't understand that one either. It will go away.

>
>> +        if ( ring_info->mfns )
>> +            xfree(ring_info->mfns);
>> +        ret = v4v_find_ring_mfns(d, ring_info, npage, pfn_hnd);
>> +        spin_unlock(&ring_info->lock);
>> +        if ( ret )
>> +            break;
>> +
>> +        if ( !need_to_insert )
>> +        {
>> +            read_unlock(&d->v4v->lock);
>> +        }
>> +        else
>> +        {
>> +            uint16_t hash = v4v_hash_fn(&ring.id);
>> +
>> +            write_lock(&d->v4v->lock);
>> +            hlist_add_head(&ring_info->node,&d->v4v->ring_hash[hash]);
>> +            write_unlock(&d->v4v->lock);
>> +        }
>> +    } while ( 0 );
>> +
>> +    read_unlock(&v4v_lock);
>> +    return ret;
>> +}
> [...]
>> +void
>> +v4vtables_print_rule(struct v4vtables_rule_node *node)
>> +{
> [...]
>> +    if ( rule->accept == 1 )
>> +        printk("ACCEPT");
>
> The logic in the actual ACL is (effectively) 'if ( rule->accept != 0 )';
> so this probably ought to follow suit. :)

Ack

>
> [...]
>> +int
>> +v4vtables_del(struct domain *src_d,
>> +              XEN_GUEST_HANDLE_PARAM(v4vtables_rule_t) rule_hnd,
>> +              int32_t position)
>> +{
>> +    struct list_head *tmp = NULL;
>> +    struct list_head *to_delete = NULL;
>> +    struct list_head *next = NULL;
>> +    struct v4vtables_rule_node *node;
>> +
>> +    ASSERT(rw_is_write_locked(&v4vtables_rules_lock));
>> +
>> +    v4v_dprintk("v4vtables_del position:%d\n", position);
>> +
>> +    if ( position != -1 )
>> +    {
>> +        /* We want to delete the rule number<position>  */
>> +        list_for_each(tmp,&v4vtables_rules)
>> +        {
>> +            to_delete = tmp;
>> +            if (position == 0)
>> +                break;
>> +            position--;
>> +        }
>
> Hmmm.  If this is called with position = 1 with a multi-element list, it
> selects the second element (which conflicts with the semantics of the
> add operation, where '1' is the first element).  And if it's called with
> '1' on a single-element list, it selects the first (and only) element.

Needs fixing along with other problems in the v4vtables.

>
>> +        /* Can't find the position */
>> +        if (position != 0)
>> +            to_delete = NULL;
>> +    }
>> +    else if ( !guest_handle_is_null(rule_hnd) )
>> +    {
>> +        struct v4vtables_rule r;
>> +
>> +        if ( copy_from_guest(&r, rule_hnd, 1) )
>> +            return -EFAULT;
>> +
>> +        list_for_each(tmp,&v4vtables_rules)
>> +        {
>> +            node = list_entry(tmp, struct v4vtables_rule_node, list);
>> +
>> +            if ( (node->rule.src.domain == r.src.domain)&&
>> +                 (node->rule.src.port   == r.src.port)&&
>> +                 (node->rule.dst.domain == r.dst.domain)&&
>> +                 (node->rule.dst.port   == r.dst.port))
>
> Not matching by 'accept' too?  That should be mentioned in the hypercall
> interface, I think.
>
>> +            {
>> +                to_delete = tmp;
>> +                break;
>> +            }
>> +        }
>> +    }
>> +    else
>> +    {
>> +        /* We want to flush the rules! */
>> +        printk(KERN_ERR "VIPTables: flushing rules\n");
>> +        list_for_each_safe(tmp, next,&v4vtables_rules)
>> +        {
>> +            node = list_entry(tmp, struct v4vtables_rule_node, list);
>> +            list_del(tmp);
>> +            xfree(node);
>> +        }
>> +    }
>> +
>> +    if ( to_delete )
>> +    {
>> +        node = list_entry(to_delete, struct v4vtables_rule_node, list);
>> +#ifdef V4V_DEBUG
>> +        printk(KERN_ERR "VIPTables: deleting rule: ");
>> +        v4vtables_print_rule(node);
>> +#endif /* V4V_DEBUG */
>> +        list_del(to_delete);
>> +        xfree(node);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static size_t
>> +v4vtables_list(struct domain *src_d,
>> +               XEN_GUEST_HANDLE_PARAM(v4vtables_list_t) list_hnd)
>> +{
>> +    struct list_head *ptr;
>> +    struct v4vtables_rule_node *node;
>> +    struct v4vtables_list rules_list;
>> +    uint32_t nbrules;
>> +    XEN_GUEST_HANDLE(v4vtables_rule_t) guest_rules;
>> +
>> +    ASSERT(rw_is_locked(&v4vtables_rules_lock));
>> +
>> +    memset(&rules_list, 0, sizeof (rules_list));
>> +    if ( copy_from_guest(&rules_list, list_hnd, 1) )
>> +        return -EFAULT;
>> +
>> +    ptr = v4vtables_rules.next;
>> +    while ( rules_list.start_rule != 0&&  ptr->next !=&v4vtables_rules )
>> +    {
>> +        ptr = ptr->next;
>> +        rules_list.start_rule--;
>> +    }
>
> Again, that seems to conflict with the 'add' semantics that the first
> rule is rule 1.
>
> And what if the supplied start_rule was too large?  As it stands I think
> this will return the last rule in the list rather than returning no
> rules (or an error).
>
>> +
>> +    if ( rules_list.nb_rules == 0 )
>> +        return -EINVAL;
>> +
>> +    guest_rules = guest_handle_for_field(list_hnd, v4vtables_rule_t, rules[0]);
>> +
>> +    nbrules = 0;
>> +    while ( nbrules<  rules_list.nb_rules&&  ptr !=&v4vtables_rules )
>> +    {
>> +        node = list_entry(ptr, struct v4vtables_rule_node, list);
>> +
>> +        if ( copy_to_guest(guest_rules,&node->rule, 1) )
>> +            break;
>
> Maybe also arrange to return -EFAULT?
>
>> +
>> +        guest_handle_add_offset(guest_rules, 1);
>> +
>> +        nbrules++;
>> +        ptr = ptr->next;
>> +    }
>> +
>> +    rules_list.nb_rules = nbrules;
>> +    if ( copy_field_to_guest(list_hnd,&rules_list, nb_rules) )
>> +        return -EFAULT;
>> +
>> +    return 0;
>> +}
>> +
>> +static size_t
>> +v4vtables_check(v4v_addr_t *src, v4v_addr_t *dst)
>> +{
>> +    struct list_head *ptr;
>> +    struct v4vtables_rule_node *node;
>> +    size_t ret = 0; /* Defaulting to ACCEPT */
>
> size_t?  bool_t would be better.  Also, having this function return the
> inverse of the rule encoding is just perverse.
>
> [...]
>> +    case V4VOP_tables_add:
>> +    {
>> +        uint32_t position = arg3;
>> +
>> +        XEN_GUEST_HANDLE_PARAM(v4vtables_rule_t) rule_hnd =
>> +               guest_handle_cast(arg1, v4vtables_rule_t);
>> +        rc = -EPERM;
>> +        if ( !is_hardware_domain(d) )
>> +            goto out;
>
> I think this ought to be an XSM check now, with XSM_PRIV to indicate
> that the default policy is to check IS_PRIV().  (Likewise for the
> other table ops.)

Ok. Again, we have v4vtables high on our list of needs overhauling.

>
> [...]
>> +struct keyhandler v4v_info_keyhandler =
>> +{
>> +    .diagnostic = 1,
>> +    .u.fn = dump_state,
>> +    .desc = "dump v4v states and interupt"
>
> s/interupt/interrupt/
>
> [...]
>> +/*
>> + * V4VOP_register_ring
>> + *
>> + * Registers a ring with Xen. If a ring with the same v4v_ring_id exists,
>> + * the hypercall will return -EEXIST.
>> + *
>> + * do_v4v_op(V4VOP_register_ring,
>> + *           XEN_GUEST_HANDLE(v4v_ring_t), XEN_GUEST_HANDLE(v4v_pfn_t),
>> + *           npage, 0)
>
> Maybe add a 'uint32_t ' before npage, to match the same style below?
>
> [...]
>> +/*
>> + * V4VOP_sendv
>> + *
>> + * Sends of list of buffer contained in iov.
>> + *
>> + * For each iov entry send iov_len bytes of iov_base to addr.dst, giving
>> + * src as the source address (xen will ignore src->domain and put your
>> + * domain in the actually message), xen first looks for a ring with id.addr==dst
>> + * and id.partner==sending_domain if that fails it looks for id.addr==dst and
>
> These lines are longer than 80 chars - can you wrap a little narrower please?
> Also s/actually/actual/.
>
>> + * id.partner==DOMID_ANY.
>> + * message_type is the 32 bit number used from the message
>> + * most likely V4V_MESSAGE_DGRAM or V4V_MESSAGE_STREAM. If insufficient space exists
>> + * it will return -EAGAIN and xen will twing the V4V_INTERRUPT when
>
> 'twing'? :)

Not my "verb" :)

>
> [...]
>> +/*
>> + * V4VOP_tables_add
>> + *
>> + * Insert a filtering rules after a given position.
>> + *
>
> Can you add a description of the semantics of the rules?
> E.g. does the first match or the last match or the tightest match dominate?
>
>> + * do_v4v_op(V4VOP_tables_add,
>> + *           XEN_GUEST_HANDLE(v4vtables_rule_t) rule,
>> + *           NULL,
>> + *           uint32_t position, 0)
>> + */
>> +#define V4VOP_tables_add        7
>> +
>> +/*
>> + * V4VOP_tables_del
>> + *
>> + * Delete a filtering rules at a position or the rule
>> + * that matches "rule".
>> + *
>> + * do_v4v_op(V4VOP_tables_del,
>> + *           XEN_GUEST_HANDLE(v4vtables_rule_t) rule,
>> + *           NULL,
>> + *           uint32_t position, 0)
>> + */
>> +#define V4VOP_tables_del       8
>
> Can you describe the calling convention a bit more fully?  In particular
> that position should == -1 if you want to match by rule (though I think
> == 0 would make more sense), and that position == -1&&  rule == NULL
> means to flush everything.

Yes, I will take all your v4vtables concerns into account when we work 
on it.

Again, thanks a bunch.
Ross

>
> Cheers,
>
> Tim.