All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Add V4V to Xen (v6)
@ 2012-09-20 11:41 Jean Guyader
  2012-09-20 11:41 ` [PATCH 1/4] xen: Introduce guest_handle_for_field Jean Guyader
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Jean Guyader @ 2012-09-20 11:41 UTC (permalink / raw)
  To: xen-devel; +Cc: tim, Jean Guyader, JBeulich

[-- Attachment #1: Type: text/plain, Size: 2881 bytes --]

v6 changes:
        - Check compiler before using [0] or [].

v5 changes:
        - Fix prototypes in v4v.h for do_v4v_op
        - Add padding to make sure all the structures
          are 64 bits aligned
        - Replace [0] with []

v4 changes:
        - Stop using ssize_t, use long instead
        - Return -MSGSIZE for message bigger than 2GB
        - Fixup size handling
        - Rename protocol with message_type to avoid the
          confusion with the IP protocol flag
        - Replaced V4V_DOMID_ANY value with DOMID_INVALID
        - Replaced v4v_pfn_t with xen_pfn_t
        - Add padding to struct v4v_info
        - Fixup hypercall documentation
        - Move V4V_ROUNDUP to v4v.c
        - Remove v4v_utils.h (we could potentially add it later).

v3 changes:
        - Switch to event channel
                - Allocated a unbound event channel
                  per domain.
                - Add a new v4v call to share the
                  event channel port.
        - Public headers with actual type definition
        - Align all the v4v type to 64 bits
        - Modify v4v MAGIC numbers because we won't
          but backward compatible anymore
        - Merge insert and insertv
        - Merge send and sendv
        - Turn all the lock prerequisite from comment
          to ASSERT()
        - Make use or write_atomic instead of volatile pointers
        - Merge v4v_memcpy_to_guest_ring and
          v4v_memcpy_to_guest_ring_from_guest
                - Introduce copy_from_guest_maybe that can take
                  a void * and a handle as src address.
        - Replace 6 arguments hypercalls with 5 arguments hypercalls.

v2 changes:
        - Cleanup plugin header
        - Include basic access control
        - Use guest_handle_for_field

Jan Beulich (1):
  xen: Introduce guest_handle_for_field

Jean Guyader (3):
  xen: virq, remove VIRQ_XC_RESERVED
  xen: events, exposes evtchn_alloc_unbound_domain
  xen: Add V4V implementation

 xen/arch/x86/hvm/hvm.c             |    6 +-
 xen/arch/x86/x86_64/compat/entry.S |    2 +
 xen/arch/x86/x86_64/entry.S        |    2 +
 xen/common/Makefile                |    1 +
 xen/common/domain.c                |   13 +-
 xen/common/event_channel.c         |   39 +-
 xen/common/v4v.c                   | 1905 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/guest_access.h |    3 +
 xen/include/public/v4v.h           |  324 ++++++
 xen/include/public/xen.h           |    3 +-
 xen/include/xen/event.h            |    3 +
 xen/include/xen/sched.h            |    4 +
 xen/include/xen/v4v.h              |  133 +++
 13 files changed, 2421 insertions(+), 17 deletions(-)
 create mode 100644 xen/common/v4v.c
 create mode 100644 xen/include/public/v4v.h
 create mode 100644 xen/include/xen/v4v.h

-- 
1.7.9.5


[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] xen: Introduce guest_handle_for_field
  2012-09-20 11:41 [PATCH 0/4] Add V4V to Xen (v6) Jean Guyader
@ 2012-09-20 11:41 ` Jean Guyader
  2012-09-20 11:42 ` [PATCH 2/4] xen: virq, remove VIRQ_XC_RESERVED Jean Guyader
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Jean Guyader @ 2012-09-20 11:41 UTC (permalink / raw)
  To: xen-devel; +Cc: tim, JBeulich

[-- Attachment #1: Type: text/plain, Size: 206 bytes --]


This helper turns a field of a GUEST_HANDLE in
a GUEST_HANDLE.

Signed-off-by: Jan Beulich <JBeulich@suse.com>
---
 xen/include/asm-x86/guest_access.h |    3 +++
 1 file changed, 3 insertions(+)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-xen-Introduce-guest_handle_for_field.patch --]
[-- Type: text/x-patch; name="0001-xen-Introduce-guest_handle_for_field.patch", Size: 547 bytes --]

diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index 2b429c2..e3ac1d6 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -51,6 +51,9 @@
     (XEN_GUEST_HANDLE(type)) { _x };            \
 })
 
+#define guest_handle_for_field(hnd, type, fld)          \
+    ((XEN_GUEST_HANDLE(type)) { &(hnd).p->fld })
+
 #define guest_handle_from_ptr(ptr, type)        \
     ((XEN_GUEST_HANDLE(type)) { (type *)ptr })
 #define const_guest_handle_from_ptr(ptr, type)  \

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] xen: virq, remove VIRQ_XC_RESERVED
  2012-09-20 11:41 [PATCH 0/4] Add V4V to Xen (v6) Jean Guyader
  2012-09-20 11:41 ` [PATCH 1/4] xen: Introduce guest_handle_for_field Jean Guyader
@ 2012-09-20 11:42 ` Jean Guyader
  2012-09-20 11:42 ` [PATCH 3/4] xen: events, exposes evtchn_alloc_unbound_domain Jean Guyader
  2012-09-20 11:42 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
  3 siblings, 0 replies; 21+ messages in thread
From: Jean Guyader @ 2012-09-20 11:42 UTC (permalink / raw)
  To: xen-devel; +Cc: tim, Jean Guyader, JBeulich

[-- Attachment #1: Type: text/plain, Size: 257 bytes --]


VIRQ_XC_RESERVED was reserved for V4V but we have switched
to event channels so this place holder is no longer required.

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/include/public/xen.h |    1 -
 1 file changed, 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-xen-virq-remove-VIRQ_XC_RESERVED.patch --]
[-- Type: text/x-patch; name="0002-xen-virq-remove-VIRQ_XC_RESERVED.patch", Size: 667 bytes --]

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 361398b..8def420 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -155,7 +155,6 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
 #define VIRQ_CON_RING   8  /* G. (DOM0) Bytes received on console            */
 #define VIRQ_PCPU_STATE 9  /* G. (DOM0) PCPU state changed                   */
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
-#define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
 
 /* Architecture-specific VIRQ definitions. */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] xen: events, exposes evtchn_alloc_unbound_domain
  2012-09-20 11:41 [PATCH 0/4] Add V4V to Xen (v6) Jean Guyader
  2012-09-20 11:41 ` [PATCH 1/4] xen: Introduce guest_handle_for_field Jean Guyader
  2012-09-20 11:42 ` [PATCH 2/4] xen: virq, remove VIRQ_XC_RESERVED Jean Guyader
@ 2012-09-20 11:42 ` Jean Guyader
  2012-09-20 11:42 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
  3 siblings, 0 replies; 21+ messages in thread
From: Jean Guyader @ 2012-09-20 11:42 UTC (permalink / raw)
  To: xen-devel; +Cc: tim, Jean Guyader, JBeulich

[-- Attachment #1: Type: text/plain, Size: 346 bytes --]


Exposes evtchn_alloc_unbound_domain to the rest of
Xen so we can create allocated unbound evtchn within Xen.

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/common/event_channel.c |   39 +++++++++++++++++++++++++++------------
 xen/include/xen/event.h    |    3 +++
 2 files changed, 30 insertions(+), 12 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-xen-events-exposes-evtchn_alloc_unbound_domain.patch --]
[-- Type: text/x-patch; name="0003-xen-events-exposes-evtchn_alloc_unbound_domain.patch", Size: 2529 bytes --]

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 53777f8..6a0e342 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -159,36 +159,51 @@ static int get_free_port(struct domain *d)
 
 static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 {
-    struct evtchn *chn;
     struct domain *d;
-    int            port;
-    domid_t        dom = alloc->dom;
     long           rc;
 
-    rc = rcu_lock_target_domain_by_id(dom, &d);
+    rc = rcu_lock_target_domain_by_id(alloc->dom, &d);
     if ( rc )
         return rc;
 
+    rc = evtchn_alloc_unbound_domain(d, &alloc->port, alloc->remote_dom);
+    if ( rc )
+        ERROR_EXIT_DOM((int)rc, d);
+
+ out:
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
+int evtchn_alloc_unbound_domain(struct domain *d, evtchn_port_t *port,
+                                domid_t remote_domid)
+{
+    struct evtchn *chn;
+    int           rc;
+    int           free_port;
+
     spin_lock(&d->event_lock);
 
-    if ( (port = get_free_port(d)) < 0 )
-        ERROR_EXIT_DOM(port, d);
-    chn = evtchn_from_port(d, port);
+    rc = free_port = get_free_port(d);
+    if ( free_port < 0 )
+        goto out;
 
-    rc = xsm_evtchn_unbound(d, chn, alloc->remote_dom);
+    chn = evtchn_from_port(d, free_port);
+    rc = xsm_evtchn_unbound(d, chn, remote_domid);
     if ( rc )
         goto out;
 
     chn->state = ECS_UNBOUND;
-    if ( (chn->u.unbound.remote_domid = alloc->remote_dom) == DOMID_SELF )
+    if ( (chn->u.unbound.remote_domid = remote_domid) == DOMID_SELF )
         chn->u.unbound.remote_domid = current->domain->domain_id;
 
-    alloc->port = port;
+    *port = free_port;
+    /* Everything is fine, returns 0 */
+    rc = 0;
 
  out:
     spin_unlock(&d->event_lock);
-    rcu_unlock_domain(d);
-
     return rc;
 }
 
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 71c3e92..1a0c832 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -69,6 +69,9 @@ int guest_enabled_event(struct vcpu *v, uint32_t virq);
 /* Notify remote end of a Xen-attached event channel.*/
 void notify_via_xen_event_channel(struct domain *ld, int lport);
 
+int evtchn_alloc_unbound_domain(struct domain *d, evtchn_port_t *port,
+                                domid_t remote_domid);
+
 /* Internal event channel object accessors */
 #define bucket_from_port(d,p) \
     ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET])

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] xen: Add V4V implementation
  2012-09-20 11:41 [PATCH 0/4] Add V4V to Xen (v6) Jean Guyader
                   ` (2 preceding siblings ...)
  2012-09-20 11:42 ` [PATCH 3/4] xen: events, exposes evtchn_alloc_unbound_domain Jean Guyader
@ 2012-09-20 11:42 ` Jean Guyader
  2012-09-20 12:20   ` Jan Beulich
  2012-09-22 13:47   ` Julian Pidancet
  3 siblings, 2 replies; 21+ messages in thread
From: Jean Guyader @ 2012-09-20 11:42 UTC (permalink / raw)
  To: xen-devel; +Cc: tim, Jean Guyader, JBeulich

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]


Setup of v4v domains a domain gets created and cleanup
when a domain die. Wire up the v4v hypercall.

Include v4v internal and public headers.

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/arch/x86/hvm/hvm.c             |    6 +-
 xen/arch/x86/x86_64/compat/entry.S |    2 +
 xen/arch/x86/x86_64/entry.S        |    2 +
 xen/common/Makefile                |    1 +
 xen/common/domain.c                |   13 +-
 xen/common/v4v.c                   | 1905 ++++++++++++++++++++++++++++++++++++
 xen/include/public/v4v.h           |  324 ++++++
 xen/include/public/xen.h           |    2 +-
 xen/include/xen/sched.h            |    4 +
 xen/include/xen/v4v.h              |  133 +++
 10 files changed, 2388 insertions(+), 4 deletions(-)
 create mode 100644 xen/common/v4v.c
 create mode 100644 xen/include/public/v4v.h
 create mode 100644 xen/include/xen/v4v.h


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-xen-Add-V4V-implementation.patch --]
[-- Type: text/x-patch; name="0004-xen-Add-V4V-implementation.patch", Size: 69656 bytes --]

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d69e419..b685535 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3183,7 +3183,8 @@ static hvm_hypercall_t *const hvm_hypercall64_table[NR_hypercalls] = {
     HYPERCALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #define COMPAT_CALL(x)                                        \
@@ -3200,7 +3201,8 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = {
     COMPAT_CALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 int hvm_do_hypercall(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 2f606ab..f518a03 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -414,6 +414,7 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -462,6 +463,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 5 /* do_v4v_op		    */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 8156827..5f4e7bf 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -707,6 +707,7 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -755,6 +756,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 5 /* do_v4v_op		*/
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/Makefile b/xen/common/Makefile
index c1c100f..ba63cec 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -45,6 +45,7 @@ obj-y += tmem_xen.o
 obj-y += radix-tree.o
 obj-y += rbtree.o
 obj-y += lzo.o
+obj-y += v4v.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo,$(n).init.o)
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index a1aa05e..94283e3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -195,7 +195,8 @@ struct domain *domain_create(
 {
     struct domain *d, **pd;
     enum { INIT_xsm = 1u<<0, INIT_watchdog = 1u<<1, INIT_rangeset = 1u<<2,
-           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5 };
+           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5,
+           INIT_v4v = 1u<<6 };
     int err, init_status = 0;
     int poolid = CPUPOOLID_NONE;
 
@@ -307,6 +308,13 @@ struct domain *domain_create(
         spin_unlock(&domlist_update_lock);
     }
 
+    if ( !is_idle_domain(d) )
+    {
+        if ( v4v_init(d) != 0 )
+            goto fail;
+        init_status |= INIT_v4v;
+    }
+
     return d;
 
  fail:
@@ -315,6 +323,8 @@ struct domain *domain_create(
     xfree(d->mem_event);
     if ( init_status & INIT_arch )
         arch_domain_destroy(d);
+    if ( init_status & INIT_v4v )
+	v4v_destroy(d);
     if ( init_status & INIT_gnttab )
         grant_table_destroy(d);
     if ( init_status & INIT_evtchn )
@@ -468,6 +478,7 @@ int domain_kill(struct domain *d)
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
         spin_barrier(&d->domain_lock);
+        v4v_destroy(d);
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
diff --git a/xen/common/v4v.c b/xen/common/v4v.c
new file mode 100644
index 0000000..afb9267
--- /dev/null
+++ b/xen/common/v4v.c
@@ -0,0 +1,1905 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/config.h>
+#include <xen/mm.h>
+#include <xen/compat.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/v4v.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <xen/keyhandler.h>
+#include <asm/types.h>
+
+#ifdef V4V_DEBUG
+#define v4v_dprintk(format, args...)            \
+    do {                                        \
+        printk("%s:%d " format,                 \
+               __FILE__, __LINE__, ## args );   \
+    } while ( 1 == 0 )
+#else
+#define v4v_dprintk(format, ... ) (void)0
+#endif
+
+/*
+ * Messages on the ring are padded to 128 bits
+ * Len here refers to the exact length of the data not including the
+ * 128 bit header. The message uses
+ * ((len +0xf) & ~0xf) + sizeof(v4v_ring_message_header) bytes
+ */
+#define V4V_ROUNDUP(a) (((a) +0xf ) & ~0xf)
+
+DEFINE_XEN_GUEST_HANDLE(uint8_t);
+static struct v4v_ring_info *v4v_ring_find_info(struct domain *d,
+                                                v4v_ring_id_t *id);
+
+static struct v4v_ring_info *v4v_ring_find_info_by_addr(struct domain *d,
+                                                        struct v4v_addr *a,
+                                                        domid_t p);
+
+typedef struct internal_v4v_iov
+{
+    XEN_GUEST_HANDLE(v4v_iov_t) guest_iov;
+    v4v_iov_t                   *xen_iov;
+} internal_v4v_iov_t;
+
+struct list_head viprules = LIST_HEAD_INIT(viprules);
+
+/*
+ * locks
+ */
+
+/*
+ * locking is organized as follows:
+ *
+ * the global lock v4v_lock: L1 protects the v4v elements
+ * of all struct domain *d in the system, it does not
+ * protect any of the elements of d->v4v, just their
+ * addresses. By extension since the destruction of
+ * a domain with a non-NULL d->v4v will need to free
+ * the d->v4v pointer, holding this lock gauruntees
+ * that no domains pointers in which v4v is interested
+ * become invalid whilst this lock is held.
+ */
+
+static DEFINE_RWLOCK(v4v_lock); /* L1 */
+
+/*
+ * the lock d->v4v->lock: L2:  Read on protects the hash table and
+ * the elements in the hash_table d->v4v->ring_hash, and
+ * the node and id fields in struct v4v_ring_info in the
+ * hash table. Write on L2 protects all of the elements of
+ * struct v4v_ring_info. To take L2 you must already have R(L1)
+ * W(L1) implies W(L2) and L3
+ *
+ * the lock v4v_ring_info *ringinfo; ringinfo->lock: L3:
+ * protects len,tx_ptr the guest ring, the
+ * guest ring_data and the pending list. To take L3 you must
+ * already have R(L2). W(L2) implies L3
+ */
+
+/*
+ * lock to protect the filtering rules list: viprules
+ *
+ * The write lock is held for viptables_del and viptables_add
+ * The read lock is held for viptable_list
+ */
+static DEFINE_RWLOCK(viprules_lock);
+
+
+/*
+ * Debugs
+ */
+
+#ifdef V4V_DEBUG
+static void __attribute__((unused))
+v4v_hexdump(void *_p, int len)
+{
+    uint8_t *buf = (uint8_t *)_p;
+    int i, j;
+
+    for ( i = 0; i < len; i += 16 )
+    {
+        printk(KERN_ERR "%p:", &buf[i]);
+        for ( j = 0; j < 16; ++j )
+        {
+            int k = i + j;
+            if ( k < len )
+                printk(" %02x", buf[k]);
+            else
+                printk("   ");
+        }
+        printk(" ");
+
+        for ( j = 0; j < 16; ++j )
+        {
+            int k = i + j;
+            if ( k < len )
+                printk("%c", ((buf[k] > 32) && (buf[k] < 127)) ? buf[k] : '.');
+            else
+                printk(" ");
+        }
+        printk("\n");
+    }
+}
+#endif
+
+
+/*
+ * Event channel
+ */
+
+static void
+v4v_signal_domain(struct domain *d)
+{
+    v4v_dprintk("send guest VIRQ_V4V domid:%d\n", d->domain_id);
+
+    evtchn_send(d, d->v4v->evtchn_port);
+}
+
+static void
+v4v_signal_domid(domid_t id)
+{
+    struct domain *d = get_domain_by_id(id);
+    if ( !d )
+        return;
+    v4v_signal_domain(d);
+    put_domain(d);
+}
+
+
+/*
+ * ring buffer
+ */
+
+static void
+v4v_ring_unmap(struct v4v_ring_info *ring_info)
+{
+    int i;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    for ( i = 0; i < ring_info->npage; ++i )
+    {
+        if ( !ring_info->mfn_mapping[i] )
+            continue;
+        v4v_dprintk("unmapping page %p from %p\n",
+                    (void*)mfn_x(ring_info->mfns[i]),
+                    ring_info->mfn_mapping[i]);
+
+        unmap_domain_page(ring_info->mfn_mapping[i]);
+        ring_info->mfn_mapping[i] = NULL;
+    }
+}
+
+static uint8_t *
+v4v_ring_map_page(struct v4v_ring_info *ring_info, int i)
+{
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( i >= ring_info->npage )
+        return NULL;
+    if ( ring_info->mfn_mapping[i] )
+        return ring_info->mfn_mapping[i];
+    ring_info->mfn_mapping[i] = map_domain_page(mfn_x(ring_info->mfns[i]));
+
+    v4v_dprintk("mapping page %p to %p\n",
+                (void *)mfn_x(ring_info->mfns[i]),
+                ring_info->mfn_mapping[i]);
+    return ring_info->mfn_mapping[i];
+}
+
+static int
+v4v_memcpy_from_guest_ring(void *_dst, struct v4v_ring_info *ring_info,
+                           uint32_t offset, uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *src;
+    uint8_t *dst = _dst;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= PAGE_SIZE - 1;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        src = v4v_ring_map_page(ring_info, page);
+
+        if ( !src )
+            return -EFAULT;
+
+        v4v_dprintk("memcpy(%p,%p+%d,%d)\n",
+                    dst, src, offset,
+                    (int)(PAGE_SIZE - offset));
+        memcpy(dst, src + offset, PAGE_SIZE - offset);
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        dst += PAGE_SIZE - offset;
+        offset = 0;
+    }
+
+    src = v4v_ring_map_page(ring_info, page);
+    if ( !src )
+        return -EFAULT;
+
+    v4v_dprintk("memcpy(%p,%p+%d,%d)\n", dst, src, offset, len);
+    memcpy(dst, src + offset, len);
+
+    return 0;
+}
+
+static int
+v4v_update_tx_ptr(struct v4v_ring_info *ring_info, uint32_t tx_ptr)
+{
+    uint8_t *dst = v4v_ring_map_page(ring_info, 0);
+    uint32_t *p = (uint32_t *)(dst + offsetof(v4v_ring_t, tx_ptr));
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( !dst )
+        return -EFAULT;
+    write_atomic(p, tx_ptr);
+    mb();
+    return 0;
+}
+
+static int
+v4v_copy_from_guest_maybe(void *dst, void *src,
+                          XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                          uint32_t len)
+{
+    int rc = 0;
+
+    if ( src )
+        memcpy(dst, src, len);
+    else
+        rc = copy_from_guest(dst, src_hnd, len);
+    return rc;
+}
+
+static int
+v4v_memcpy_to_guest_ring(struct v4v_ring_info *ring_info,
+                         uint32_t offset,
+                         void *src,
+                         XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                         uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *dst;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= PAGE_SIZE - 1;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        dst = v4v_ring_map_page(ring_info, page);
+        if ( !dst )
+            return -EFAULT;
+
+        if ( v4v_copy_from_guest_maybe(dst + offset, src, src_hnd,
+                                       PAGE_SIZE - offset) )
+            return -EFAULT;
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        if ( src )
+            src += (PAGE_SIZE - offset);
+        else
+            guest_handle_add_offset(src_hnd, PAGE_SIZE - offset);
+        offset = 0;
+    }
+
+    dst = v4v_ring_map_page(ring_info, page);
+    if ( !dst )
+        return -EFAULT;
+
+    if ( v4v_copy_from_guest_maybe(dst + offset, src, src_hnd, len) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static int
+v4v_ringbuf_get_rx_ptr(struct domain *d, struct v4v_ring_info *ring_info,
+                        uint32_t * rx_ptr)
+{
+    v4v_ring_t *ringp;
+
+    if ( ring_info->npage == 0 )
+        return -1;
+
+    ringp = map_domain_page(mfn_x(ring_info->mfns[0]));
+
+    v4v_dprintk("v4v_ringbuf_payload_space: mapped %p to %p\n",
+                (void *)mfn_x(ring_info->mfns[0]), ringp);
+    if ( !ringp )
+        return -1;
+
+    write_atomic(rx_ptr, ringp->rx_ptr);
+    mb();
+
+    unmap_domain_page(mfn_x(ring_info->mfns[0]));
+    return 0;
+}
+
+uint32_t
+v4v_ringbuf_payload_space(struct domain * d, struct v4v_ring_info * ring_info)
+{
+    v4v_ring_t ring;
+    int32_t ret;
+
+    ring.tx_ptr = ring_info->tx_ptr;
+    ring.len = ring_info->len;
+
+    if ( v4v_ringbuf_get_rx_ptr(d, ring_info, &ring.rx_ptr) )
+        return 0;
+
+    v4v_dprintk("v4v_ringbuf_payload_space:tx_ptr=%d rx_ptr=%d\n",
+                (int)ring.tx_ptr, (int)ring.rx_ptr);
+    if ( ring.rx_ptr == ring.tx_ptr )
+        return ring.len - sizeof (struct v4v_ring_message_header);
+
+    ret = ring.rx_ptr - ring.tx_ptr;
+    if ( ret < 0 )
+        ret += ring.len;
+
+    ret -= sizeof (struct v4v_ring_message_header);
+    ret -= V4V_ROUNDUP(1);
+
+    return (ret < 0) ? 0 : ret;
+}
+
+static unsigned long
+v4v_iov_copy(v4v_iov_t *iov, internal_v4v_iov_t *iovs)
+{
+    if (iovs->xen_iov == NULL)
+    {
+        return copy_from_guest(iov, iovs->guest_iov, 1);
+    }
+    else
+    {
+        *iov = *(iovs->xen_iov);
+        return 0;
+    }
+}
+
+static void
+v4v_iov_add_offset(internal_v4v_iov_t *iovs, int offset)
+{
+    if (iovs->xen_iov == NULL)
+        guest_handle_add_offset(iovs->guest_iov, offset);
+    else
+        iovs->xen_iov += offset;
+}
+
+static long
+v4v_iov_count(internal_v4v_iov_t iovs, int niov)
+{
+    v4v_iov_t iov;
+    size_t ret = 0;
+
+    while ( niov-- )
+    {
+        if ( v4v_iov_copy(&iov, &iovs) )
+            return -EFAULT;
+
+        /* message bigger than 2G can't be sent */
+        if (ret > 2L * 1024 * 1024 * 1024)
+            return -EMSGSIZE;
+
+        ret += iov.iov_len;
+        v4v_iov_add_offset(&iovs, 1);
+    }
+
+    return ret;
+}
+
+static long
+v4v_ringbuf_insertv(struct domain *d,
+                    struct v4v_ring_info *ring_info,
+                    v4v_ring_id_t *src_id, uint32_t proto,
+                    internal_v4v_iov_t iovs, uint32_t niov,
+                    size_t len)
+{
+    v4v_ring_t ring;
+    struct v4v_ring_message_header mh = { 0 };
+    int32_t sp;
+    long happy_ret;
+    int32_t ret = 0;
+    XEN_GUEST_HANDLE(uint8_t) empty_hnd = { 0 };
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    happy_ret = len;
+
+    if ( (V4V_ROUNDUP(len) + sizeof (struct v4v_ring_message_header) ) >=
+            ring_info->len)
+        return -EMSGSIZE;
+
+    do
+    {
+        if ( (ret = v4v_memcpy_from_guest_ring(&ring, ring_info, 0,
+                                               sizeof (ring))) )
+            break;
+
+        ring.tx_ptr = ring_info->tx_ptr;
+        ring.len = ring_info->len;
+
+        v4v_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d ring_info->tx_ptr=%d\n",
+                    ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
+
+        if ( ring.rx_ptr == ring.tx_ptr )
+            sp = ring_info->len;
+        else
+        {
+            sp = ring.rx_ptr - ring.tx_ptr;
+            if ( sp < 0 )
+                sp += ring.len;
+        }
+
+        if ( (V4V_ROUNDUP(len) + sizeof (struct v4v_ring_message_header)) >= sp )
+        {
+            v4v_dprintk("EAGAIN\n");
+            ret = -EAGAIN;
+            break;
+        }
+
+        mh.len = len + sizeof (struct v4v_ring_message_header);
+        mh.source = src_id->addr;
+        mh.message_type = proto;
+
+        if ( (ret = v4v_memcpy_to_guest_ring(ring_info,
+                                             ring.tx_ptr + sizeof (v4v_ring_t),
+                                             &mh, empty_hnd,
+                                             sizeof (mh))) )
+            break;
+
+        ring.tx_ptr += sizeof (mh);
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        while ( niov-- )
+        {
+            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
+            v4v_iov_t iov;
+
+            if ( v4v_iov_copy(&iov, &iovs) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            buf_hnd.p = (uint8_t *)iov.iov_base; //FIXME
+            len = iov.iov_len;
+
+            if ( unlikely(!guest_handle_okay(buf_hnd, len)) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            sp = ring.len - ring.tx_ptr;
+
+            if ( len > sp )
+            {
+                ret = v4v_memcpy_to_guest_ring(ring_info,
+                        ring.tx_ptr + sizeof (v4v_ring_t),
+                        NULL, buf_hnd, sp);
+                if ( ret )
+                    break;
+
+                ring.tx_ptr = 0;
+                len -= sp;
+                guest_handle_add_offset(buf_hnd, sp);
+            }
+
+            ret = v4v_memcpy_to_guest_ring(ring_info,
+                    ring.tx_ptr + sizeof (v4v_ring_t),
+                    NULL, buf_hnd, len);
+            if ( ret )
+                break;
+
+            ring.tx_ptr += len;
+
+            if ( ring.tx_ptr == ring_info->len )
+                ring.tx_ptr = 0;
+
+            v4v_iov_add_offset(&iovs, 1);
+        }
+        if ( ret )
+            break;
+
+        ring.tx_ptr = V4V_ROUNDUP(ring.tx_ptr);
+
+        if ( ring.tx_ptr >= ring_info->len )
+            ring.tx_ptr -= ring_info->len;
+
+        mb();
+        ring_info->tx_ptr = ring.tx_ptr;
+        if ( (ret = v4v_update_tx_ptr(ring_info, ring.tx_ptr)) )
+            break;
+    }
+    while ( 0 );
+
+    v4v_ring_unmap(ring_info);
+
+    return ret ? ret : happy_ret;
+}
+
+
+
+/* pending */
+static void
+v4v_pending_remove_ent(struct v4v_pending_ent *ent)
+{
+    hlist_del(&ent->node);
+    xfree(ent);
+}
+
+static void
+v4v_pending_remove_all(struct v4v_ring_info *info)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    ASSERT(spin_is_locked(&info->lock));
+    hlist_for_each_entry_safe(pending_ent, node, next, &info->pending,
+            node) v4v_pending_remove_ent(pending_ent);
+}
+
+static void
+v4v_pending_notify(struct domain *caller_d, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    ASSERT(rw_is_locked(&v4v_lock));
+
+    hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node)
+    {
+        hlist_del(&pending_ent->node);
+        v4v_signal_domid(pending_ent->id);
+        xfree(pending_ent);
+    }
+
+}
+
+static void
+v4v_pending_find(struct domain *d, struct v4v_ring_info *ring_info,
+                 uint32_t payload_space, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( payload_space >= ent->len )
+        {
+            hlist_del(&ent->node);
+            hlist_add_head(&ent->node, to_notify);
+        }
+    }
+    spin_unlock(&ring_info->lock);
+}
+
+/*caller must have L3 */
+static int
+v4v_pending_queue(struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct v4v_pending_ent *ent = xmalloc(struct v4v_pending_ent);
+
+    if ( !ent )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    ent->len = len;
+    ent->id = src_id;
+
+    hlist_add_head(&ent->node, &ring_info->pending);
+
+    return 0;
+}
+
+/* L3 */
+static int
+v4v_pending_requeue(struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct hlist_node *node;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry(ent, node, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id )
+        {
+            if ( ent->len < len )
+                ent->len = len;
+            return 0;
+        }
+    }
+
+    return v4v_pending_queue(ring_info, src_id, len);
+}
+
+
+/* L3 */
+static void
+v4v_pending_cancel(struct v4v_ring_info *ring_info, domid_t src_id)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id)
+        {
+            hlist_del(&ent->node);
+            xfree(ent);
+        }
+    }
+}
+
+/*
+ * ring data
+ */
+
+/*Caller should hold R(L1)*/
+static int
+v4v_fill_ring_data(struct domain *src_d,
+                   XEN_GUEST_HANDLE(v4v_ring_data_ent_t) data_ent_hnd)
+{
+    v4v_ring_data_ent_t ent;
+    struct domain *dst_d;
+    struct v4v_ring_info *ring_info;
+
+    if ( copy_from_guest(&ent, data_ent_hnd, 1) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+
+    v4v_dprintk("v4v_fill_ring_data: ent.ring.domain=%d,ent.ring.port=%u\n",
+                (int)ent.ring.domain, (int)ent.ring.port);
+
+    ent.flags = 0;
+
+    dst_d = get_domain_by_id(ent.ring.domain);
+
+    if ( dst_d && dst_d->v4v )
+    {
+        read_lock(&dst_d->v4v->lock);
+        ring_info = v4v_ring_find_info_by_addr(dst_d, &ent.ring,
+                                               src_d->domain_id);
+
+        if ( ring_info )
+        {
+            uint32_t space_avail;
+
+            ent.flags |= V4V_RING_DATA_F_EXISTS;
+            ent.max_message_size =
+                ring_info->len - sizeof (struct v4v_ring_message_header) -
+                V4V_ROUNDUP(1);
+            spin_lock(&ring_info->lock);
+
+            space_avail = v4v_ringbuf_payload_space(dst_d, ring_info);
+
+            if ( space_avail >= ent.space_required )
+            {
+                v4v_pending_cancel(ring_info, src_d->domain_id);
+                ent.flags |= V4V_RING_DATA_F_SUFFICIENT;
+            }
+            else
+            {
+                v4v_pending_requeue(ring_info, src_d->domain_id,
+                        ent.space_required);
+                ent.flags |= V4V_RING_DATA_F_PENDING;
+            }
+
+            spin_unlock(&ring_info->lock);
+
+            if ( space_avail == ent.max_message_size )
+                ent.flags |= V4V_RING_DATA_F_EMPTY;
+
+        }
+        read_unlock(&dst_d->v4v->lock);
+    }
+
+    if ( dst_d )
+        put_domain(dst_d);
+
+    if ( copy_field_to_guest(data_ent_hnd, &ent, flags) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+    return 0;
+}
+
+/*Called should hold no more than R(L1) */
+static int
+v4v_fill_ring_datas(struct domain *d, int nent,
+                     XEN_GUEST_HANDLE(v4v_ring_data_ent_t) data_ent_hnd)
+{
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+    while ( !ret && nent-- )
+    {
+        ret = v4v_fill_ring_data(d, data_ent_hnd);
+        guest_handle_add_offset(data_ent_hnd, 1);
+    }
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+/*
+ * ring
+ */
+static int
+v4v_find_ring_mfns(struct domain *d, struct v4v_ring_info *ring_info,
+                   uint32_t npage, XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd)
+{
+    int i,j;
+    mfn_t *mfns;
+    uint8_t **mfn_mapping;
+    unsigned long mfn;
+    struct page_info *page;
+    int ret = 0;
+
+    if ( (npage << PAGE_SHIFT) < ring_info->len )
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    mfns = xmalloc_array(mfn_t, npage);
+    if ( !mfns )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    mfn_mapping = xmalloc_array(uint8_t *, npage);
+    if ( !mfn_mapping )
+    {
+        xfree(mfns);
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < npage; ++i )
+    {
+        unsigned long pfn;
+        p2m_type_t p2mt;
+
+        if ( copy_from_guest_offset(&pfn, pfn_hnd, i, 1) )
+        {
+            ret = -EFAULT;
+            v4v_dprintk("EFAULT\n");
+            break;
+        }
+
+        mfn = mfn_x(get_gfn(d, pfn, &p2mt));
+        if ( !mfn_valid(mfn) )
+        {
+            printk(KERN_ERR "v4v domain %d passed invalid mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        page = mfn_to_page(mfn);
+        if ( !get_page_and_type(page, d, PGT_writable_page) )
+        {
+            printk(KERN_ERR "v4v domain %d passed wrong type mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        mfns[i] = _mfn(mfn);
+        v4v_dprintk("v4v_find_ring_mfns: %d: %lx -> %lx\n",
+                    i, (unsigned long)pfn, (unsigned long)mfn_x(mfns[i]));
+        mfn_mapping[i] = NULL;
+        put_gfn(d, pfn);
+    }
+
+    if ( !ret )
+    {
+        ring_info->npage = npage;
+        ring_info->mfns = mfns;
+        ring_info->mfn_mapping = mfn_mapping;
+    }
+    else
+    {
+        j = i;
+        for ( i = 0; i < j; ++i )
+            if ( mfn_x(mfns[i]) != 0 )
+                put_page_and_type(mfn_to_page(mfn_x(mfns[i])));
+        xfree(mfn_mapping);
+        xfree(mfns);
+    }
+    return ret;
+}
+
+
+static struct v4v_ring_info *
+v4v_ring_find_info(struct domain *d, v4v_ring_id_t *id)
+{
+    uint16_t hash;
+    struct hlist_node *node;
+    struct v4v_ring_info *ring_info;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    hash = v4v_hash_fn(id);
+
+    v4v_dprintk("ring_find_info: d->v4v=%p, d->v4v->ring_hash[%d]=%p id=%p\n",
+                d->v4v, (int)hash, d->v4v->ring_hash[hash].first, id);
+    v4v_dprintk("ring_find_info: id.addr.port=%d id.addr.domain=%d id.addr.partner=%d\n",
+                id->addr.port, id->addr.domain, id->partner);
+
+    hlist_for_each_entry(ring_info, node, &d->v4v->ring_hash[hash], node)
+    {
+        v4v_ring_id_t *cmpid = &ring_info->id;
+
+        if ( cmpid->addr.port == id->addr.port &&
+             cmpid->addr.domain == id->addr.domain &&
+             cmpid->partner == id->partner)
+        {
+            v4v_dprintk("ring_find_info: ring_info=%p\n", ring_info);
+            return ring_info;
+        }
+    }
+    v4v_dprintk("ring_find_info: no ring_info found\n");
+    return NULL;
+}
+
+static struct v4v_ring_info *
+v4v_ring_find_info_by_addr(struct domain *d, struct v4v_addr *a, domid_t p)
+{
+    v4v_ring_id_t id;
+    struct v4v_ring_info *ret;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    if ( !a )
+        return NULL;
+
+    id.addr.port = a->port;
+    id.addr.domain = d->domain_id;
+    id.partner = p;
+
+    ret = v4v_ring_find_info(d, &id);
+    if ( ret )
+        return ret;
+
+    id.partner = V4V_DOMID_ANY;
+
+    return v4v_ring_find_info(d, &id);
+}
+
+static void v4v_ring_remove_mfns(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    int i;
+
+    ASSERT(rw_is_write_locked(&d->v4v->lock));
+
+    if ( ring_info->mfns )
+    {
+        for ( i = 0; i < ring_info->npage; ++i )
+            if ( mfn_x(ring_info->mfns[i]) != 0 )
+                put_page_and_type(mfn_to_page(mfn_x(ring_info->mfns[i])));
+        xfree(ring_info->mfns);
+    }
+    if ( ring_info->mfn_mapping )
+        xfree(ring_info->mfn_mapping);
+    ring_info->mfns = NULL;
+
+}
+
+static void
+v4v_ring_remove_info(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    ASSERT(rw_is_write_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+
+    v4v_pending_remove_all(ring_info);
+    hlist_del(&ring_info->node);
+    v4v_ring_remove_mfns(d, ring_info);
+
+    spin_unlock(&ring_info->lock);
+
+    xfree(ring_info);
+}
+
+/* Call from guest to unpublish a ring */
+static long
+v4v_ring_remove(struct domain *d, XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest(&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                    ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+
+        write_lock(&d->v4v->lock);
+        ring_info = v4v_ring_find_info(d, &ring.id);
+
+        if ( ring_info )
+            v4v_ring_remove_info(d, ring_info);
+
+        write_unlock(&d->v4v->lock);
+
+        if ( !ring_info )
+        {
+            v4v_dprintk("ENOENT\n");
+            ret = -ENOENT;
+            break;
+        }
+
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+/* call from guest to publish a ring */
+static long
+v4v_ring_add(struct domain *d, XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd,
+             uint32_t npage, XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int need_to_insert = 0;
+    int ret = 0;
+
+    if ( (long)ring_hnd.p & (PAGE_SIZE - 1) )
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock(&v4v_lock);
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk(" !d->v4v, EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest(&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk(" copy_from_guest failed, EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( (ring.len <
+                    (sizeof (struct v4v_ring_message_header) + V4V_ROUNDUP(1) +
+                     V4V_ROUNDUP(1))) || (V4V_ROUNDUP(ring.len) != ring.len) )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+        if ( copy_field_to_guest(ring_hnd, &ring, id) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        /*
+         * no need for a lock yet, because only we know about this
+         * set the tx pointer if it looks bogus (we don't reset it
+         * because this might be a re-register after S4)
+         */
+        if ( (ring.tx_ptr >= ring.len)
+                || (V4V_ROUNDUP(ring.tx_ptr) != ring.tx_ptr) )
+        {
+            ring.tx_ptr = ring.rx_ptr;
+        }
+        copy_field_to_guest(ring_hnd, &ring, tx_ptr);
+
+        read_lock(&d->v4v->lock);
+        ring_info = v4v_ring_find_info(d, &ring.id);
+
+        if ( !ring_info )
+        {
+            read_unlock(&d->v4v->lock);
+            ring_info = xmalloc(struct v4v_ring_info);
+            if ( !ring_info )
+            {
+                v4v_dprintk("ENOMEM\n");
+                ret = -ENOMEM;
+                break;
+            }
+            need_to_insert++;
+            spin_lock_init(&ring_info->lock);
+            INIT_HLIST_HEAD(&ring_info->pending);
+            ring_info->mfns = NULL;
+
+        }
+        else
+        {
+            /*
+             * Ring info already existed.
+             */
+            printk(KERN_INFO "v4v: dom%d ring already registered\n",
+                    current->domain->domain_id);
+            ret = -EEXIST;
+            break;
+        }
+
+        spin_lock(&ring_info->lock);
+        ring_info->id = ring.id;
+        ring_info->len = ring.len;
+        ring_info->tx_ptr = ring.tx_ptr;
+        ring_info->ring = ring_hnd;
+        if ( ring_info->mfns )
+            xfree(ring_info->mfns);
+        ret = v4v_find_ring_mfns(d, ring_info, npage, pfn_hnd);
+        spin_unlock(&ring_info->lock);
+        if ( ret )
+            break;
+
+        if ( !need_to_insert )
+        {
+            read_unlock(&d->v4v->lock);
+        }
+        else
+        {
+            uint16_t hash = v4v_hash_fn(&ring.id);
+            write_lock(&d->v4v->lock);
+            hlist_add_head(&ring_info->node, &d->v4v->ring_hash[hash]);
+            write_unlock(&d->v4v->lock);
+        }
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+
+/*
+ * io
+ */
+
+static void
+v4v_notify_ring(struct domain *d, struct v4v_ring_info *ring_info,
+                struct hlist_head *to_notify)
+{
+    uint32_t space;
+
+    ASSERT(rw_is_locked(&v4v_lock));
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+    space = v4v_ringbuf_payload_space(d, ring_info);
+    spin_unlock(&ring_info->lock);
+
+    v4v_pending_find(d, ring_info, space, to_notify);
+}
+
+/*notify hypercall*/
+static long
+v4v_notify(struct domain *d,
+           XEN_GUEST_HANDLE(v4v_ring_data_t) ring_data_hnd)
+{
+    v4v_ring_data_t ring_data;
+    HLIST_HEAD(to_notify);
+    int i;
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+
+    if ( !d->v4v )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!d->v4v, ENODEV\n");
+        return -ENODEV;
+    }
+
+    read_lock(&d->v4v->lock);
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node, *next;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry_safe(ring_info, node,
+                next, &d->v4v->ring_hash[i],
+                node)
+        {
+            v4v_notify_ring(d, ring_info, &to_notify);
+        }
+    }
+    read_unlock(&d->v4v->lock);
+
+    if ( !hlist_empty(&to_notify) )
+        v4v_pending_notify(d, &to_notify);
+
+    do
+    {
+        if ( !guest_handle_is_null(ring_data_hnd) )
+        {
+            /* Quick sanity check on ring_data_hnd */
+            if ( copy_field_from_guest(&ring_data, ring_data_hnd, magic) )
+            {
+                v4v_dprintk("copy_field_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            if ( ring_data.magic != V4V_RING_DATA_MAGIC )
+            {
+                v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring_data.magic, V4V_RING_MAGIC);
+                ret = -EINVAL;
+                break;
+            }
+
+            if ( copy_from_guest(&ring_data, ring_data_hnd, 1) )
+            {
+                v4v_dprintk("copy_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            {
+                XEN_GUEST_HANDLE(v4v_ring_data_ent_t) ring_data_ent_hnd;
+                ring_data_ent_hnd =
+                    guest_handle_for_field(ring_data_hnd, v4v_ring_data_ent_t, data[0]);
+                ret = v4v_fill_ring_datas(d, ring_data.nent, ring_data_ent_hnd);
+            }
+        }
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+
+    return ret;
+}
+
+#ifdef V4V_DEBUG
+void
+v4v_viptables_print_rule(struct v4v_viptables_rule_node *node)
+{
+    v4v_viptables_rule_t *rule;
+
+    if ( node == NULL )
+    {
+        printk("(null)\n");
+        return;
+    }
+
+    rule = &node->rule;
+
+    if ( rule->accept == 1 )
+        printk("ACCEPT");
+    else
+        printk("REJECT");
+
+    printk(" ");
+
+    if ( rule->src.domain == V4V_DOMID_ANY )
+        printk("*");
+    else
+        printk("%i", rule->src.domain);
+
+    printk(":");
+
+    if ( rule->src.port == -1 )
+        printk("*");
+    else
+        printk("%i", rule->src.port);
+
+    printk(" -> ");
+
+    if ( rule->dst.domain == V4V_DOMID_ANY )
+        printk("*");
+    else
+        printk("%i", rule->dst.domain);
+
+    printk(":");
+
+    if ( rule->dst.port == -1 )
+        printk("*");
+    else
+        printk("%i", rule->dst.port);
+
+    printk("\n");
+}
+#endif /* V4V_DEBUG */
+
+int
+v4v_viptables_add(struct domain *src_d,
+                  XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+                  int32_t position)
+{
+    struct v4v_viptables_rule_node* new = NULL;
+    struct list_head* tmp;
+
+    ASSERT(rw_is_write_locked(&viprules_lock));
+
+    /* First rule is n.1 */
+    position--;
+
+    new = xmalloc(struct v4v_viptables_rule_node);
+    if ( new == NULL )
+        return -ENOMEM;
+
+    if ( copy_from_guest(&new->rule, rule, 1) )
+    {
+        xfree(new);
+        return -EFAULT;
+    }
+
+#ifdef V4V_DEBUG
+    printk(KERN_ERR "VIPTables: ");
+    v4v_viptables_print_rule(new);
+#endif /* V4V_DEBUG */
+
+    tmp = &viprules;
+    while ( position != 0 && tmp->next != &viprules )
+    {
+        tmp = tmp->next;
+        position--;
+    }
+    list_add(&new->list, tmp);
+
+    return 0;
+}
+
+int
+v4v_viptables_del(struct domain *src_d,
+                  XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd,
+                  int32_t position)
+{
+    struct list_head *tmp = NULL;
+    struct list_head *next = NULL;
+    struct v4v_viptables_rule_node *node;
+
+    ASSERT(rw_is_write_locked(&viprules_lock));
+
+    v4v_dprintk("v4v_viptables_del position:%d\n", position);
+
+    if ( position != -1 )
+    {
+        /* We want to delete the rule number <position> */
+        tmp = &viprules;
+        while ( position != 0 && tmp->next != &viprules )
+        {
+            tmp = tmp->next;
+            position--;
+        }
+    }
+    else if ( !guest_handle_is_null(rule_hnd) )
+    {
+        struct v4v_viptables_rule r;
+
+        if ( copy_from_guest(&r, rule_hnd, 1) )
+            return -EFAULT;
+
+        list_for_each(tmp, &viprules)
+        {
+            node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+
+            if ( (node->rule.src.domain == r.src.domain) &&
+                 (node->rule.src.port   == r.src.port)   &&
+                 (node->rule.dst.domain == r.dst.domain) &&
+                 (node->rule.dst.port   == r.dst.port))
+            {
+                position = 0;
+                break;
+            }
+        }
+    }
+    else
+    {
+        /* We want to flush the rules! */
+        printk(KERN_ERR "VIPTables: flushing rules\n");
+        list_for_each_safe(tmp, next, &viprules)
+        {
+            node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+            list_del(tmp);
+            xfree(node);
+        }
+    }
+
+    if ( position == 0 && tmp != &viprules )
+    {
+        node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+#ifdef V4V_DEBUG
+        printk(KERN_ERR "VIPTables: deleting rule: ");
+        v4v_viptables_print_rule(node);
+#endif /* V4V_DEBUG */
+        list_del(tmp);
+        xfree(node);
+    }
+
+    return 0;
+}
+
+static size_t
+v4v_viptables_list(struct domain *src_d,
+                   XEN_GUEST_HANDLE(v4v_viptables_list_t) list_hnd)
+{
+    struct list_head *ptr;
+    struct v4v_viptables_rule_node *node;
+    struct v4v_viptables_list rules_list;
+    uint32_t nbrules;
+    XEN_GUEST_HANDLE(v4v_viptables_rule_t) guest_rules;
+
+    ASSERT(rw_is_locked(&viprules_lock));
+
+    memset(&rules_list, 0, sizeof (rules_list));
+    if ( copy_from_guest(&rules_list, list_hnd, 1) )
+        return -EFAULT;
+
+    ptr = viprules.next;
+    while ( rules_list.start_rule != 0 && ptr->next != &viprules )
+    {
+        ptr = ptr->next;
+        rules_list.start_rule--;
+    }
+
+    if ( rules_list.nb_rules == 0 )
+        return -EINVAL;
+
+    guest_rules = guest_handle_for_field(list_hnd, v4v_viptables_rule_t, rules[0]);
+
+    nbrules = 0;
+    while ( nbrules < rules_list.nb_rules && ptr != &viprules )
+    {
+        node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+        if ( !guest_handle_okay(guest_rules, 1) )
+            break;
+
+        if ( copy_to_guest(guest_rules, &node->rule, 1) )
+            break;
+
+        guest_handle_add_offset(guest_rules, 1);
+
+        nbrules++;
+        ptr = ptr->next;
+    }
+
+    rules_list.nb_rules = nbrules;
+    if ( copy_field_to_guest(list_hnd, &rules_list, nb_rules) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static size_t
+v4v_viptables_check(v4v_addr_t * src, v4v_addr_t * dst)
+{
+    struct list_head *ptr;
+    struct v4v_viptables_rule_node *node;
+    size_t ret = 0; /* Defaulting to ACCEPT */
+
+    read_lock(&viprules_lock);
+
+    list_for_each(ptr, &viprules)
+    {
+        node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+        if ( (node->rule.src.domain == V4V_DOMID_ANY ||
+              node->rule.src.domain == src->domain) &&
+             (node->rule.src.port == V4V_PORT_ANY ||
+              node->rule.src.port == src->port) &&
+             (node->rule.dst.domain == V4V_DOMID_ANY ||
+              node->rule.dst.domain == dst->domain) &&
+             (node->rule.dst.port == V4V_PORT_ANY ||
+              node->rule.dst.port == dst->port) )
+        {
+            ret = !node->rule.accept;
+            break;
+        }
+    }
+
+    read_unlock(&viprules_lock);
+    return ret;
+}
+
+/*
+ * Hypercall to do the send
+ */
+static long
+v4v_sendv(struct domain *src_d, v4v_addr_t * src_addr,
+          v4v_addr_t * dst_addr, uint32_t proto,
+          internal_v4v_iov_t iovs, size_t niov)
+{
+    struct domain *dst_d;
+    v4v_ring_id_t src_id;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    if ( !dst_addr )
+    {
+        v4v_dprintk("!dst_addr, EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock(&v4v_lock);
+    if ( !src_d->v4v )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!src_d->v4v, EINVAL\n");
+        return -EINVAL;
+    }
+
+    src_id.addr.port = src_addr->port;
+    src_id.addr.domain = src_d->domain_id;
+    src_id.partner = dst_addr->domain;
+
+    dst_d = get_domain_by_id(dst_addr->domain);
+    if ( !dst_d )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!dst_d, ECONNREFUSED\n");
+        return -ECONNREFUSED;
+    }
+
+    if ( v4v_viptables_check(src_addr, dst_addr) != 0 )
+    {
+        read_unlock(&v4v_lock);
+        gdprintk(XENLOG_WARNING,
+                 "V4V: VIPTables REJECTED %i:%i -> %i:%i\n",
+                 src_addr->domain, src_addr->port,
+                 dst_addr->domain, dst_addr->port);
+        return -ECONNREFUSED;
+    }
+
+    do
+    {
+        if ( !dst_d->v4v )
+        {
+            v4v_dprintk("dst_d->v4v, ECONNREFUSED\n");
+            ret = -ECONNREFUSED;
+            break;
+        }
+
+        read_lock(&dst_d->v4v->lock);
+        ring_info =
+            v4v_ring_find_info_by_addr(dst_d, dst_addr, src_addr->domain);
+
+        if ( !ring_info )
+        {
+            ret = -ECONNREFUSED;
+            v4v_dprintk(" !ring_info, ECONNREFUSED\n");
+        }
+        else
+        {
+            long len = v4v_iov_count(iovs, niov);
+
+            if ( len < 0 )
+            {
+                ret = len;
+                break;
+            }
+
+            spin_lock(&ring_info->lock);
+            ret =
+                v4v_ringbuf_insertv(dst_d, ring_info, &src_id, proto, iovs,
+                        niov, len);
+            if ( ret == -EAGAIN )
+            {
+                v4v_dprintk("v4v_ringbuf_insertv failed, EAGAIN\n");
+                /* Schedule a wake up on the event channel when space is there */
+                if ( v4v_pending_requeue(ring_info, src_d->domain_id, len) )
+                {
+                    v4v_dprintk("v4v_pending_requeue failed, ENOMEM\n");
+                    ret = -ENOMEM;
+                }
+            }
+            spin_unlock(&ring_info->lock);
+
+            if ( ret >= 0 )
+            {
+                v4v_signal_domain(dst_d);
+            }
+
+        }
+        read_unlock(&dst_d->v4v->lock);
+
+    }
+    while ( 0 );
+
+    put_domain(dst_d);
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+static void
+v4v_info(struct domain *d, v4v_info_t *info)
+{
+    read_lock(&d->v4v->lock);
+    info->ring_magic = V4V_RING_MAGIC;
+    info->data_magic = V4V_RING_DATA_MAGIC;
+    info->evtchn = d->v4v->evtchn_port;
+    read_unlock(&d->v4v->lock);
+}
+
+/*
+ * hypercall glue
+ */
+long
+do_v4v_op(int cmd, XEN_GUEST_HANDLE(void) arg1,
+          XEN_GUEST_HANDLE(void) arg2,
+          uint32_t arg3, uint32_t arg4)
+{
+    struct domain *d = current->domain;
+    long rc = -EFAULT;
+
+    v4v_dprintk("->do_v4v_op(%d,%p,%p,%d,%d)\n", cmd,
+                (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
+
+    domain_lock(d);
+    switch (cmd)
+    {
+        case V4VOP_register_ring:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
+                    guest_handle_cast(arg1, v4v_ring_t);
+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
+                    guest_handle_cast(arg2, xen_pfn_t);
+                uint32_t npage = arg3;
+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+                    goto out;
+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
+                    goto out;
+                rc = v4v_ring_add(d, ring_hnd, npage, pfn_hnd);
+                break;
+            }
+        case V4VOP_unregister_ring:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
+                    guest_handle_cast(arg1, v4v_ring_t);
+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+                    goto out;
+                rc = v4v_ring_remove(d, ring_hnd);
+                break;
+            }
+        case V4VOP_send:
+            {
+                uint32_t len = arg3;
+                uint32_t message_type = arg4;
+                v4v_iov_t iov;
+                internal_v4v_iov_t iovs;
+                XEN_GUEST_HANDLE(v4v_send_addr_t) addr_hnd =
+                    guest_handle_cast(arg1, v4v_send_addr_t);
+                v4v_send_addr_t addr;
+
+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest(&addr, addr_hnd, 1) )
+                    goto out;
+
+                iov.iov_base = (uint64_t)arg2.p; // FIXME
+                iov.iov_len = len;
+                iovs.xen_iov = &iov;
+                rc = v4v_sendv(d, &addr.src, &addr.dst, message_type, iovs, 1);
+                break;
+            }
+        case V4VOP_sendv:
+            {
+                internal_v4v_iov_t iovs;
+                uint32_t niov = arg3;
+                uint32_t message_type = arg4;
+                XEN_GUEST_HANDLE(v4v_send_addr_t) addr_hnd =
+                    guest_handle_cast(arg1, v4v_send_addr_t);
+                v4v_send_addr_t addr;
+
+                memset(&iovs, 0, sizeof (iovs));
+                iovs.guest_iov = guest_handle_cast(arg2, v4v_iov_t);
+
+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest(&addr, addr_hnd, 1) )
+                    goto out;
+
+                if ( unlikely(!guest_handle_okay(iovs.guest_iov, niov)) )
+                    goto out;
+
+                rc = v4v_sendv(d, &addr.src, &addr.dst, message_type, iovs, niov);
+                break;
+            }
+        case V4VOP_notify:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_data_t) ring_data_hnd =
+                    guest_handle_cast(arg1, v4v_ring_data_t);
+                rc = v4v_notify(d, ring_data_hnd);
+                break;
+            }
+        case V4VOP_viptables_add:
+            {
+                uint32_t position = arg3;
+                XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                write_lock(&viprules_lock);
+                rc = v4v_viptables_add(d, rule_hnd, position);
+                write_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_viptables_del:
+            {
+                uint32_t position = arg3;
+                XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                write_lock(&viprules_lock);
+                rc = v4v_viptables_del(d, rule_hnd, position);
+                write_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_viptables_list:
+            {
+                XEN_GUEST_HANDLE(v4v_viptables_list_t) rules_list_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_list_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                rc = -EFAULT;
+                if ( unlikely(!guest_handle_okay(rules_list_hnd, 1)) )
+                    goto out;
+
+                read_lock(&viprules_lock);
+                rc = v4v_viptables_list(d, rules_list_hnd);
+                read_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_info:
+            {
+                XEN_GUEST_HANDLE(v4v_info_t) info_hnd =
+                    guest_handle_cast(arg1, v4v_info_t);
+                v4v_info_t info;
+
+                if ( unlikely(!guest_handle_okay(info_hnd, 1)) )
+                    goto out;
+                v4v_info(d, &info);
+                if ( copy_to_guest(info_hnd, &info, 1) )
+                    goto out;
+                rc = 0;
+                break;
+            }
+        default:
+            rc = -ENOSYS;
+            break;
+    }
+out:
+    domain_unlock(d);
+    v4v_dprintk("<-do_v4v_op()=%d\n", (int)rc);
+    return rc;
+}
+
+/*
+ * init
+ */
+
+void
+v4v_destroy(struct domain *d)
+{
+    int i;
+
+    BUG_ON(!d->is_dying);
+    write_lock(&v4v_lock);
+
+    v4v_dprintk("d->v=%p\n", d->v4v);
+
+    if ( d->v4v )
+    {
+        for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+        {
+            struct hlist_node *node, *next;
+            struct v4v_ring_info *ring_info;
+
+            hlist_for_each_entry_safe(ring_info, node,
+                    next, &d->v4v->ring_hash[i],
+                    node)
+            {
+                v4v_ring_remove_info(d, ring_info);
+            }
+        }
+    }
+
+    d->v4v = NULL;
+    write_unlock(&v4v_lock);
+}
+
+int
+v4v_init(struct domain *d)
+{
+    struct v4v_domain *v4v;
+    evtchn_port_t port;
+    int i;
+    int rc;
+
+    v4v = xmalloc(struct v4v_domain);
+    if ( !v4v )
+        return -ENOMEM;
+
+    rc = evtchn_alloc_unbound_domain(d, &port, d->domain_id);
+    if ( rc )
+        return rc;
+
+    rwlock_init(&v4v->lock);
+
+    v4v->evtchn_port = port;
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+        INIT_HLIST_HEAD(&v4v->ring_hash[i]);
+
+    write_lock(&v4v_lock);
+    d->v4v = v4v;
+    write_unlock(&v4v_lock);
+
+    return 0;
+}
+
+
+/*
+ * debug
+ */
+
+static void
+dump_domain_ring(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    uint32_t rx_ptr;
+
+    printk(KERN_ERR "  ring: domid=%d port=0x%08x partner=%d npage=%d\n",
+           (int)d->domain_id, (int)ring_info->id.addr.port,
+           (int)ring_info->id.partner, (int)ring_info->npage);
+
+    if ( v4v_ringbuf_get_rx_ptr(d, ring_info, &rx_ptr) )
+    {
+        printk(KERN_ERR "   Failed to read rx_ptr\n");
+        return;
+    }
+
+    printk(KERN_ERR "   tx_ptr=%d rx_ptr=%d len=%d\n",
+           (int)ring_info->tx_ptr, (int)rx_ptr, (int)ring_info->len);
+}
+
+static void
+dump_domain(struct domain *d)
+{
+    int i;
+
+    printk(KERN_ERR " domain %d:\n", (int)d->domain_id);
+
+    read_lock(&d->v4v->lock);
+
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry(ring_info, node, &d->v4v->ring_hash[i], node)
+            dump_domain_ring(d, ring_info);
+    }
+
+    printk(KERN_ERR "  event channel: %d\n",  d->v4v->evtchn_port);
+    read_unlock(&d->v4v->lock);
+
+    printk(KERN_ERR "\n");
+    v4v_signal_domain(d);
+}
+
+static void
+dump_state(unsigned char key)
+{
+    struct domain *d;
+
+    printk(KERN_ERR "\n\nV4V:\n");
+    read_lock(&v4v_lock);
+
+    rcu_read_lock(&domlist_read_lock);
+
+    for_each_domain(d)
+        dump_domain(d);
+
+    rcu_read_unlock(&domlist_read_lock);
+
+    read_unlock(&v4v_lock);
+}
+
+struct keyhandler v4v_info_keyhandler =
+{
+    .diagnostic = 1,
+    .u.fn = dump_state,
+    .desc = "dump v4v states and interupt"
+};
+
+static int __init
+setup_dump_rings(void)
+{
+    register_keyhandler('4', &v4v_info_keyhandler);
+    return 0;
+}
+
+__initcall(setup_dump_rings);
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/v4v.h b/xen/include/public/v4v.h
new file mode 100644
index 0000000..380ac29
--- /dev/null
+++ b/xen/include/public/v4v.h
@@ -0,0 +1,324 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __XEN_PUBLIC_V4V_H__
+#define __XEN_PUBLIC_V4V_H__
+
+#include "xen.h"
+#include "event_channel.h"
+
+/*
+ * Structure definitions
+ */
+
+#define V4V_RING_MAGIC          0xa822f72bb0b9d8ccULL
+#define V4V_RING_DATA_MAGIC	0x45fe852220b801d4ULL
+
+#define V4V_MESSAGE_DGRAM       0x3c2c1db8
+#define V4V_MESSAGE_STREAM 	0x70f6a8e5
+
+#define V4V_DOMID_ANY           DOMID_INVALID
+#define V4V_PORT_ANY            0
+
+typedef struct v4v_iov
+{
+    uint64_t iov_base;
+    uint32_t iov_len;
+    uint32_t pad;
+} v4v_iov_t;
+
+typedef struct v4v_addr
+{
+    uint32_t port;
+    domid_t domain;
+    uint16_t pad;
+} v4v_addr_t;
+
+typedef struct v4v_ring_id
+{
+    v4v_addr_t addr;
+    domid_t partner;
+    uint16_t pad;
+} v4v_ring_id_t;
+
+typedef struct
+{
+    v4v_addr_t src;
+    v4v_addr_t dst;
+} v4v_send_addr_t;
+
+/*
+ * v4v_ring
+ * id: xen only looks at this during register/unregister
+ *     and will fill in id.addr.domain
+ * rx_ptr: rx pointer, modified by domain
+ * tx_ptr: tx pointer, modified by xen
+ *
+ */
+struct v4v_ring
+{
+    uint64_t magic;
+    v4v_ring_id_t id;
+    uint32_t len;
+    uint32_t rx_ptr;
+    uint32_t tx_ptr;
+    uint8_t reserved[32];
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    uint8_t ring[];
+#elif defined(__GNUC__)
+    uint8_t ring[0];
+#endif
+};
+typedef struct v4v_ring v4v_ring_t;
+
+#define V4V_RING_DATA_F_EMPTY       (1U << 0) /* Ring is empty */
+#define V4V_RING_DATA_F_EXISTS      (1U << 1) /* Ring exists */
+#define V4V_RING_DATA_F_PENDING     (1U << 2) /* Pending interrupt exists - do not
+                                               * rely on this field - for
+                                               * profiling only */
+#define V4V_RING_DATA_F_SUFFICIENT  (1U << 3) /* Sufficient space to queue
+                                               * space_required bytes exists */
+
+typedef struct v4v_ring_data_ent
+{
+    v4v_addr_t ring;
+    uint16_t flags;
+    uint16_t pad1;
+    uint32_t pad2;
+    uint32_t space_required;
+    uint32_t max_message_size;
+} v4v_ring_data_ent_t;
+
+typedef struct v4v_ring_data
+{
+    uint64_t magic;
+    uint32_t nent;
+    uint32_t pad;
+    uint64_t reserved[4];
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    v4v_ring_data_ent_t data[];
+#elif defined(__GNUC__)
+    v4v_ring_data_ent_t data[0];
+#endif
+} v4v_ring_data_t;
+
+struct v4v_info
+{
+    uint64_t ring_magic;
+    uint64_t data_magic;
+    evtchn_port_t evtchn;
+    uint32_t pad;
+};
+typedef struct v4v_info v4v_info_t;
+
+#define V4V_SHF_SYN		(1 << 0)
+#define V4V_SHF_ACK		(1 << 1)
+#define V4V_SHF_RST		(1 << 2)
+
+#define V4V_SHF_PING		(1 << 8)
+#define V4V_SHF_PONG		(1 << 9)
+
+struct v4v_stream_header
+{
+    uint32_t flags;
+    uint32_t conid;
+};
+
+struct v4v_ring_message_header
+{
+    uint32_t len;
+    uint32_t pad0;
+    v4v_addr_t source;
+    uint32_t message_type;
+    uint32_t pad1;
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    uint8_t data[];
+#elif defined(__GNUC__)
+    uint8_t data[0];
+#endif
+};
+
+typedef struct v4v_viptables_rule
+{
+    v4v_addr_t src;
+    v4v_addr_t dst;
+    uint32_t accept;
+    uint32_t pad;
+} v4v_viptables_rule_t;
+
+typedef struct v4v_viptables_list
+{
+    uint32_t start_rule;
+    uint32_t nb_rules;
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    struct v4v_viptables_rule rules[];
+#elif defined(__GNUC__)
+    struct v4v_viptables_rule rules[0];
+#endif
+} v4v_viptables_list_t;
+
+/*
+ * HYPERCALLS
+ */
+
+/*
+ * V4VOP_register_ring
+ *
+ * Registers a ring with Xen. If a ring with the same v4v_ring_id exists,
+ * the hypercall will return -EEXIST.
+ *
+ * do_v4v_op(V4VOP_unregister_ring,
+ *           XEN_GUEST_HANDLE(v4v_ring_t), XEN_GUEST_HANDLE(xen_pfn_t),
+ *           npage, 0)
+ */
+#define V4VOP_register_ring 	1
+
+
+/*
+ * V4VOP_unregister_ring
+ *
+ * Unregister a ring.
+ *
+ * do_v4v_op(V4VOP_unregister_ring,
+ *           XEN_GUEST_HANDLE(v4v_ring),
+ *           NULL, 0, 0)
+ */
+#define V4VOP_unregister_ring 	2
+
+/*
+ * V4VOP_send
+ *
+ * Sends len bytes of buf to dst, giving src as the source address (xen will
+ * ignore src->domain and put your domain in the actually message), xen
+ * first looks for a ring with id.addr==dst and id.partner==sending_domain
+ * if that fails it looks for id.addr==dst and id.partner==DOMID_ANY.
+ * message_type is the 32 bit number used from the message
+ * most likely V4V_MESSAGE_DGRAM or V4V_MESSAGE_STREAM. If insufficient space exists
+ * it will return -EAGAIN and xen will twing the V4V_INTERRUPT when
+ * sufficient space becomes available
+ *
+ * do_v4v_op(V4VOP_send,
+ *           XEN_GUEST_HANDLE(v4v_send_addr_t) addr,
+ *           XEN_GUEST_HANDLE(void) buf,
+ *           uint32_t len,
+ *           uint32_t message_type)
+ */
+#define V4VOP_send 		3
+
+
+/*
+ * V4VOP_notify
+ *
+ * Asks xen for information about other rings in the system
+ *
+ * ent->ring is the v4v_addr_t of the ring you want information on
+ * the same matching rules are used as for V4VOP_send.
+ *
+ * ent->space_required  if this field is not null xen will check
+ * that there is space in the destination ring for this many bytes
+ * of payload. If there is it will set the V4V_RING_DATA_F_SUFFICIENT
+ * and CANCEL any pending interrupt for that ent->ring, if insufficient
+ * space is available it will schedule an interrupt and the flag will
+ * not be set.
+ *
+ * The flags are set by xen when notify replies
+ * V4V_RING_DATA_F_EMPTY	ring is empty
+ * V4V_RING_DATA_F_PENDING	interrupt is pending - don't rely on this
+ * V4V_RING_DATA_F_SUFFICIENT	sufficient space for space_required is there
+ * V4V_RING_DATA_F_EXISTS	ring exists
+ *
+ * do_v4v_op(V4VOP_notify,
+ *           XEN_GUEST_HANDLE(v4v_ring_data_ent) ent,
+ *           NULL, 0, 0)
+ */
+#define V4VOP_notify 		4
+
+/*
+ * V4VOP_sendv
+ *
+ * Identical to V4VOP_send except rather than buf and len it takes
+ * an array of v4v_iov and a length of the array.
+ *
+ * do_v4v_op(V4VOP_sendv,
+ *           XEN_GUEST_HANDLE(v4v_send_addr_t) addr,
+ *           XEN_GUEST_HANDLE(v4v_iov) iov,
+ *           uint32_t niov,
+ *           uint32_t message_type)
+ */
+#define V4VOP_sendv		5
+
+/*
+ * V4VOP_viptables_add
+ *
+ * Insert a filtering rules after a given position.
+ *
+ * do_v4v_op(V4VOP_viptables_add,
+ *           XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+ *           NULL,
+ *           uint32_t position, 0)
+ */
+#define V4VOP_viptables_add     6
+
+/*
+ * V4VOP_viptables_del
+ *
+ * Delete a filtering rules at a position or the rule
+ * that matches "rule".
+ *
+ * do_v4v_op(V4VOP_viptables_del,
+ *           XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+ *           NULL,
+ *           uint32_t position, 0)
+ */
+#define V4VOP_viptables_del     7
+
+/*
+ * V4VOP_viptables_list
+ *
+ * Delete a filtering rules at a position or the rule
+ * that matches "rule".
+ *
+ * do_v4v_op(V4VOP_viptables_list,
+ *           XEN_GUEST_HANDLE(v4v_vitpables_list_t) list,
+ *           NULL, 0, 0)
+ */
+#define V4VOP_viptables_list    8
+
+/*
+ * V4VOP_info
+ *
+ * Returns v4v info for the current domain (domain that issued the hypercall).
+ *      - V4V magic number
+ *      - event channel port (for current domain)
+ *
+ * do_v4v_op(V4VOP_info,
+ *           XEN_GUEST_HANDLE(v4v_info_t) info,
+ *           NULL, 0, 0)
+ */
+#define V4VOP_info              9
+
+#endif /* __XEN_PUBLIC_V4V_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 8def420..7a3fade 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -97,7 +97,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
 #define __HYPERVISOR_domctl               36
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
-#define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_v4v_op               39
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 53804c8..296de52 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -23,6 +23,7 @@
 #include <public/sysctl.h>
 #include <public/vcpu.h>
 #include <public/mem_event.h>
+#include <xen/v4v.h>
 
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
@@ -350,6 +351,9 @@ struct domain
     nodemask_t node_affinity;
     unsigned int last_alloc_node;
     spinlock_t node_affinity_lock;
+
+    /* v4v */
+    struct v4v_domain *v4v;
 };
 
 struct domain_setup_info
diff --git a/xen/include/xen/v4v.h b/xen/include/xen/v4v.h
new file mode 100644
index 0000000..05d20b5
--- /dev/null
+++ b/xen/include/xen/v4v.h
@@ -0,0 +1,133 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __V4V_PRIVATE_H__
+#define __V4V_PRIVATE_H__
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/smp.h>
+#include <xen/shared.h>
+#include <xen/list.h>
+#include <public/v4v.h>
+
+#define V4V_HTABLE_SIZE 32
+
+/*
+ * Handlers
+ */
+
+DEFINE_XEN_GUEST_HANDLE (v4v_iov_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_addr_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_send_addr_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_ent_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_info_t);
+
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_rule_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_list_t);
+
+/*
+ * Helper functions
+ */
+
+static inline uint16_t
+v4v_hash_fn (v4v_ring_id_t *id)
+{
+    uint16_t ret;
+    ret = (uint16_t) (id->addr.port >> 16);
+    ret ^= (uint16_t) id->addr.port;
+    ret ^= id->addr.domain;
+    ret ^= id->partner;
+
+    ret &= (V4V_HTABLE_SIZE-1);
+
+    return ret;
+}
+
+struct v4v_pending_ent
+{
+    struct hlist_node node;
+    domid_t id;
+    uint32_t len;
+};
+
+
+struct v4v_ring_info
+{
+    /* next node in the hash, protected by L2  */
+    struct hlist_node node;
+    /* this ring's id, protected by L2 */
+    v4v_ring_id_t id;
+    /* L3 */
+    spinlock_t lock;
+    /* cached length of the ring (from ring->len), protected by L3 */
+    uint32_t len;
+    uint32_t npage;
+    /* cached tx pointer location, protected by L3 */
+    uint32_t tx_ptr;
+    /* guest ring, protected by L3 */
+    XEN_GUEST_HANDLE(v4v_ring_t) ring;
+    /* mapped ring pages protected by L3*/
+    uint8_t **mfn_mapping;
+    /* list of mfns of guest ring */
+    mfn_t *mfns;
+    /* list of struct v4v_pending_ent for this ring, L3 */
+    struct hlist_head pending;
+};
+
+/*
+ * The value of the v4v element in a struct domain is
+ * protected by the global lock L1
+ */
+struct v4v_domain
+{
+    /* L2 */
+    rwlock_t lock;
+    /* event channel */
+    evtchn_port_t evtchn_port;
+    /* protected by L2 */
+    struct hlist_head ring_hash[V4V_HTABLE_SIZE];
+};
+
+typedef struct v4v_viptables_rule_node
+{
+    struct list_head list;
+    v4v_viptables_rule_t rule;
+} v4v_viptables_rule_node_t;
+
+void v4v_destroy(struct domain *d);
+int v4v_init(struct domain *d);
+long do_v4v_op (int cmd,
+                XEN_GUEST_HANDLE (void) arg1,
+                XEN_GUEST_HANDLE (void) arg2,
+                uint32_t arg3,
+                uint32_t arg4);
+
+#endif /* __V4V_PRIVATE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-09-20 11:42 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
@ 2012-09-20 12:20   ` Jan Beulich
  2012-10-04 12:03     ` Jean Guyader
  2012-09-22 13:47   ` Julian Pidancet
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2012-09-20 12:20 UTC (permalink / raw)
  To: Jean Guyader; +Cc: tim, xen-devel

>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
>+        case V4VOP_register_ring:
>+            {
>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
>+                    guest_handle_cast(arg1, v4v_ring_t);
>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
>+                    guest_handle_cast(arg2, xen_pfn_t);
>+                uint32_t npage = arg3;
>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>+                    goto out;
>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>+                    goto out;

Here and below - this isn't sufficient for compat guests, or else
you give them a way to point into the compat m2p table.

>+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
>+                    goto out;
>+                if ( copy_from_guest(&addr, addr_hnd, 1) )
>+                    goto out;

The former is redundant with the latter. Also, if you already
used guest_handle_okay() on an array, you then can (and
should) use the cheaper __copy_{to,from}_guest() functions.
While looking at this, I also spotted at least on guest access
without error checking. Plus many guest accesses happen
with some sort of lock held - that'll cause problems when the
guest memory you're trying to access was paged out (quite
likely you would be able to point out other such cases, but
those likely predate paging, and hence are an oversight of
whoever introduced the paging code).

>+struct v4v_ring_message_header
>+{
>+    uint32_t len;
>+    uint32_t pad0;

Why? v4v_addr_t is 4-byte aligned.

>+    v4v_addr_t source;
>+    uint32_t message_type;
>+    uint32_t pad1;

And again, why?

>+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
>+    uint8_t data[];
>+#elif defined(__GNUC__)
>+    uint8_t data[0];
>+#endif
>+};
...
>+/*
>+ * V4VOP_register_ring
>+ *
>+ * Registers a ring with Xen. If a ring with the same v4v_ring_id exists,
>+ * the hypercall will return -EEXIST.
>+ *
>+ * do_v4v_op(V4VOP_unregister_ring,
>+ *           XEN_GUEST_HANDLE(v4v_ring_t), XEN_GUEST_HANDLE(xen_pfn_t),
>+ *           npage, 0)
>+ */
>+#define V4VOP_register_ring 	1

"register" or "unregister"?

Also, note the use of v4v_ring_t here, but ...

>+/*
>+ * V4VOP_unregister_ring
>+ *
>+ * Unregister a ring.
>+ *
>+ * do_v4v_op(V4VOP_unregister_ring,
>+ *           XEN_GUEST_HANDLE(v4v_ring),
>+ *           NULL, 0, 0)
>+ */
>+#define V4VOP_unregister_ring 	2

... not here. Please be consistent, even if this is just comments.
(Again at least for V4VOP_sendv.)

>+/*
>+ * V4VOP_viptables_list
>+ *
>+ * Delete a filtering rules at a position or the rule

Delete? Also, here and above, make clear whether you talk
about a single or multiple rules.

>+ * that matches "rule".
>+ *
>+ * do_v4v_op(V4VOP_viptables_list,
>+ *           XEN_GUEST_HANDLE(v4v_vitpables_list_t) list,
>+ *           NULL, 0, 0)
>+ */
>+#define V4VOP_viptables_list    8

Also, with all of the padding fields and unused arguments (for
certain sub-ops) - if you see the slightest chance of them ever
getting used for some extension, you will want to make sure
they got zero-filled by the guest.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-09-20 11:42 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
  2012-09-20 12:20   ` Jan Beulich
@ 2012-09-22 13:47   ` Julian Pidancet
  2012-09-27  9:37     ` Jean Guyader
  1 sibling, 1 reply; 21+ messages in thread
From: Julian Pidancet @ 2012-09-22 13:47 UTC (permalink / raw)
  To: Jean Guyader; +Cc: tim, JBeulich, xen-devel

On 09/20/12 12:42, Jean Guyader wrote:
> 
> Setup of v4v domains a domain gets created and cleanup
> when a domain die. Wire up the v4v hypercall.
> 
> Include v4v internal and public headers.
> 

Could we rename "viptables" in "v4vtables", or something that doesn't
contain "IP" in the name ? Since IP is a totally unrelated protocol, it
doesn't make sense here.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-09-22 13:47   ` Julian Pidancet
@ 2012-09-27  9:37     ` Jean Guyader
  0 siblings, 0 replies; 21+ messages in thread
From: Jean Guyader @ 2012-09-27  9:37 UTC (permalink / raw)
  To: Julian Pidancet; +Cc: tim, JBeulich, Jean Guyader, xen-devel

On 22 September 2012 14:47, Julian Pidancet <julian.pidancet@gmail.com> wrote:
> On 09/20/12 12:42, Jean Guyader wrote:
>>
>> Setup of v4v domains a domain gets created and cleanup
>> when a domain die. Wire up the v4v hypercall.
>>
>> Include v4v internal and public headers.
>>
>
> Could we rename "viptables" in "v4vtables", or something that doesn't
> contain "IP" in the name ? Since IP is a totally unrelated protocol, it
> doesn't make sense here.
>

Ok sure, I don't have any objections to that.

Jean

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-09-20 12:20   ` Jan Beulich
@ 2012-10-04 12:03     ` Jean Guyader
  2012-10-04 12:11       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Jean Guyader @ 2012-10-04 12:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: tim, Jean Guyader, xen-devel

On 20 September 2012 13:20, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
>>+        case V4VOP_register_ring:
>>+            {
>>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
>>+                    guest_handle_cast(arg1, v4v_ring_t);
>>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
>>+                    guest_handle_cast(arg2, xen_pfn_t);
>>+                uint32_t npage = arg3;
>>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>>+                    goto out;
>>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>>+                    goto out;
>
> Here and below - this isn't sufficient for compat guests, or else
> you give them a way to point into the compat m2p table.
>

I'll probably switch to uint64_t for the v4v mfn list instead of using
xen_pfn_t which
are unsigned long. That way I can save the need for a compat wrapper.

>>+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
>>+                    goto out;
>>+                if ( copy_from_guest(&addr, addr_hnd, 1) )
>>+                    goto out;
>
> The former is redundant with the latter. Also, if you already
> used guest_handle_okay() on an array, you then can (and
> should) use the cheaper __copy_{to,from}_guest() functions.
> While looking at this, I also spotted at least on guest access
> without error checking. Plus many guest accesses happen
> with some sort of lock held - that'll cause problems when the
> guest memory you're trying to access was paged out (quite
> likely you would be able to point out other such cases, but
> those likely predate paging, and hence are an oversight of
> whoever introduced the paging code).
>

There are plenty of other place in Xen where we copy data from
a guest with some sort of lock held.

I understand that the new code should do it's best to make sure
it works correctly with the Xen paging code.

The best way to solve this might not be try to to avoid copying from
guest memory under a lock. I've discussed this with Tim and maybe
we could introduce a new copy_from_guest which is aware of the paging
and returns a specific error, and then we could cleanly unlock everything
and issue a continuation that will fixup the memory.

>>+struct v4v_ring_message_header
>>+{
>>+    uint32_t len;
>>+    uint32_t pad0;
>
> Why? v4v_addr_t is 4-byte aligned.
>
>>+    v4v_addr_t source;
>>+    uint32_t message_type;
>>+    uint32_t pad1;
>
> And again, why?
>
>>+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
>>+    uint8_t data[];
>>+#elif defined(__GNUC__)
>>+    uint8_t data[0];
>>+#endif
>>+};
> ...
>>+/*
>>+ * V4VOP_register_ring
>>+ *
>>+ * Registers a ring with Xen. If a ring with the same v4v_ring_id exists,
>>+ * the hypercall will return -EEXIST.
>>+ *
>>+ * do_v4v_op(V4VOP_unregister_ring,
>>+ *           XEN_GUEST_HANDLE(v4v_ring_t), XEN_GUEST_HANDLE(xen_pfn_t),
>>+ *           npage, 0)
>>+ */
>>+#define V4VOP_register_ring   1
>
> "register" or "unregister"?
>
> Also, note the use of v4v_ring_t here, but ...
>
>>+/*
>>+ * V4VOP_unregister_ring
>>+ *
>>+ * Unregister a ring.
>>+ *
>>+ * do_v4v_op(V4VOP_unregister_ring,
>>+ *           XEN_GUEST_HANDLE(v4v_ring),
>>+ *           NULL, 0, 0)
>>+ */
>>+#define V4VOP_unregister_ring         2
>
> ... not here. Please be consistent, even if this is just comments.
> (Again at least for V4VOP_sendv.)
>
>>+/*
>>+ * V4VOP_viptables_list
>>+ *
>>+ * Delete a filtering rules at a position or the rule
>
> Delete? Also, here and above, make clear whether you talk
> about a single or multiple rules.
>
>>+ * that matches "rule".
>>+ *
>>+ * do_v4v_op(V4VOP_viptables_list,
>>+ *           XEN_GUEST_HANDLE(v4v_vitpables_list_t) list,
>>+ *           NULL, 0, 0)
>>+ */
>>+#define V4VOP_viptables_list    8
>
> Also, with all of the padding fields and unused arguments (for
> certain sub-ops) - if you see the slightest chance of them ever
> getting used for some extension, you will want to make sure
> they got zero-filled by the guest.
>


Thanks for the review,
Jean

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-04 12:03     ` Jean Guyader
@ 2012-10-04 12:11       ` Jan Beulich
  2012-10-04 12:41         ` Jean Guyader
                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jan Beulich @ 2012-10-04 12:11 UTC (permalink / raw)
  To: Jean Guyader; +Cc: tim, Jean Guyader, xen-devel

>>> On 04.10.12 at 14:03, Jean Guyader <jean.guyader@gmail.com> wrote:
> On 20 September 2012 13:20, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>+        case V4VOP_register_ring:
>>>+            {
>>>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
>>>+                    guest_handle_cast(arg1, v4v_ring_t);
>>>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
>>>+                    guest_handle_cast(arg2, xen_pfn_t);
>>>+                uint32_t npage = arg3;
>>>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>>>+                    goto out;
>>>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>>>+                    goto out;
>>
>> Here and below - this isn't sufficient for compat guests, or else
>> you give them a way to point into the compat m2p table.
>>
> 
> I'll probably switch to uint64_t for the v4v mfn list instead of using
> xen_pfn_t which
> are unsigned long. That way I can save the need for a compat wrapper.

But that comment of yours doesn't address the problem I pointed
out.

>>>+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
>>>+                    goto out;
>>>+                if ( copy_from_guest(&addr, addr_hnd, 1) )
>>>+                    goto out;
>>
>> The former is redundant with the latter. Also, if you already
>> used guest_handle_okay() on an array, you then can (and
>> should) use the cheaper __copy_{to,from}_guest() functions.
>> While looking at this, I also spotted at least on guest access
>> without error checking. Plus many guest accesses happen
>> with some sort of lock held - that'll cause problems when the
>> guest memory you're trying to access was paged out (quite
>> likely you would be able to point out other such cases, but
>> those likely predate paging, and hence are an oversight of
>> whoever introduced the paging code).
>>
> 
> There are plenty of other place in Xen where we copy data from
> a guest with some sort of lock held.
> 
> I understand that the new code should do it's best to make sure
> it works correctly with the Xen paging code.
> 
> The best way to solve this might not be try to to avoid copying from
> guest memory under a lock. I've discussed this with Tim and maybe
> we could introduce a new copy_from_guest which is aware of the paging
> and returns a specific error, and then we could cleanly unlock everything
> and issue a continuation that will fixup the memory.

That's certainly an alternative.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-04 12:11       ` Jan Beulich
@ 2012-10-04 12:41         ` Jean Guyader
  2012-10-04 15:03         ` Jean Guyader
       [not found]         ` <CAEBdQ92Box+2PMohpDi6Sy=PXFmP_sxYb0ZYSOCU4U3iPdocqQ@mail.gmail.com>
  2 siblings, 0 replies; 21+ messages in thread
From: Jean Guyader @ 2012-10-04 12:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: tim, Jean Guyader, xen-devel

On 4 October 2012 13:11, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 04.10.12 at 14:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>> On 20 September 2012 13:20, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>+        case V4VOP_register_ring:
>>>>+            {
>>>>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
>>>>+                    guest_handle_cast(arg1, v4v_ring_t);
>>>>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
>>>>+                    guest_handle_cast(arg2, xen_pfn_t);
>>>>+                uint32_t npage = arg3;
>>>>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>>>>+                    goto out;
>>>>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>>>>+                    goto out;
>>>
>>> Here and below - this isn't sufficient for compat guests, or else
>>> you give them a way to point into the compat m2p table.
>>>
>>
>> I'll probably switch to uint64_t for the v4v mfn list instead of using
>> xen_pfn_t which
>> are unsigned long. That way I can save the need for a compat wrapper.
>
> But that comment of yours doesn't address the problem I pointed
> out.
>
>>>>+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
>>>>+                    goto out;
>>>>+                if ( copy_from_guest(&addr, addr_hnd, 1) )
>>>>+                    goto out;
>>>
>>> The former is redundant with the latter. Also, if you already
>>> used guest_handle_okay() on an array, you then can (and
>>> should) use the cheaper __copy_{to,from}_guest() functions.
>>> While looking at this, I also spotted at least on guest access
>>> without error checking. Plus many guest accesses happen
>>> with some sort of lock held - that'll cause problems when the
>>> guest memory you're trying to access was paged out (quite
>>> likely you would be able to point out other such cases, but
>>> those likely predate paging, and hence are an oversight of
>>> whoever introduced the paging code).
>>>
>>
>> There are plenty of other place in Xen where we copy data from
>> a guest with some sort of lock held.
>>
>> I understand that the new code should do it's best to make sure
>> it works correctly with the Xen paging code.
>>
>> The best way to solve this might not be try to to avoid copying from
>> guest memory under a lock. I've discussed this with Tim and maybe
>> we could introduce a new copy_from_guest which is aware of the paging
>> and returns a specific error, and then we could cleanly unlock everything
>> and issue a continuation that will fixup the memory.
>
> That's certainly an alternative.
>

Another way (probably a easier way) would be to add a "mlock" mechanism.

Some of the pages could be lock which mean that we know before hand that
a copy_from_guest won't raise a paging error.

In my case it would be ideal. I can comment it out in the
register/unregister ring
code, so when we have such mechanism in place the v4v code could be updated.

Jean

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-04 12:11       ` Jan Beulich
  2012-10-04 12:41         ` Jean Guyader
@ 2012-10-04 15:03         ` Jean Guyader
  2012-10-04 15:12           ` Tim Deegan
       [not found]         ` <CAEBdQ92Box+2PMohpDi6Sy=PXFmP_sxYb0ZYSOCU4U3iPdocqQ@mail.gmail.com>
  2 siblings, 1 reply; 21+ messages in thread
From: Jean Guyader @ 2012-10-04 15:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: tim, Jean Guyader, xen-devel

On 4 October 2012 13:11, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 04.10.12 at 14:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>> On 20 September 2012 13:20, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>+        case V4VOP_register_ring:
>>>>+            {
>>>>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
>>>>+                    guest_handle_cast(arg1, v4v_ring_t);
>>>>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
>>>>+                    guest_handle_cast(arg2, xen_pfn_t);
>>>>+                uint32_t npage = arg3;
>>>>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>>>>+                    goto out;
>>>>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>>>>+                    goto out;
>>>
>>> Here and below - this isn't sufficient for compat guests, or else
>>> you give them a way to point into the compat m2p table.
>>>
>>
>> I'll probably switch to uint64_t for the v4v mfn list instead of using
>> xen_pfn_t which
>> are unsigned long. That way I can save the need for a compat wrapper.
>
> But that comment of yours doesn't address the problem I pointed
> out.
>

[Resent, CCing everyone this time]

I'm sorry, I don't really get what you mean them. I've tried to get
all my struct
layout such as all the offset for the field are the same for 64b and 32b, this
way I thought I could get away with doing a compat wrapper.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
       [not found]         ` <CAEBdQ92Box+2PMohpDi6Sy=PXFmP_sxYb0ZYSOCU4U3iPdocqQ@mail.gmail.com>
@ 2012-10-04 15:07           ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2012-10-04 15:07 UTC (permalink / raw)
  To: Jean Guyader; +Cc: xen-devel

>>> On 04.10.12 at 17:02, Jean Guyader <jean.guyader@gmail.com> wrote:
> On 4 October 2012 13:11, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 04.10.12 at 14:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>>> On 20 September 2012 13:20, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>>+        case V4VOP_register_ring:
>>>>>+            {
>>>>>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
>>>>>+                    guest_handle_cast(arg1, v4v_ring_t);
>>>>>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
>>>>>+                    guest_handle_cast(arg2, xen_pfn_t);
>>>>>+                uint32_t npage = arg3;
>>>>>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>>>>>+                    goto out;
>>>>>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>>>>>+                    goto out;
>>>>
>>>> Here and below - this isn't sufficient for compat guests, or else
>>>> you give them a way to point into the compat m2p table.
>>>>
>>>
>>> I'll probably switch to uint64_t for the v4v mfn list instead of using
>>> xen_pfn_t which
>>> are unsigned long. That way I can save the need for a compat wrapper.
>>
>> But that comment of yours doesn't address the problem I pointed
>> out.
>>
> 
> I'm sorry, I don't really get what you mean them. I've tried to get
> all my struct
> layout such as all the offset for the field are the same for 64b and 32b, 
> this
> way I thought I could get away with doing a compat wrapper.

guest_handle_okay() simply isn't suitable for checking compat mode
pointers - compat_handle_okay() needs to be used for them instead.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-04 15:03         ` Jean Guyader
@ 2012-10-04 15:12           ` Tim Deegan
  2012-10-04 17:00             ` Jean Guyader
  0 siblings, 1 reply; 21+ messages in thread
From: Tim Deegan @ 2012-10-04 15:12 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Jean Guyader, Jan Beulich, xen-devel

At 16:03 +0100 on 04 Oct (1349366589), Jean Guyader wrote:
> On 4 October 2012 13:11, Jan Beulich <JBeulich@suse.com> wrote:
> >>>> On 04.10.12 at 14:03, Jean Guyader <jean.guyader@gmail.com> wrote:
> >> On 20 September 2012 13:20, Jan Beulich <JBeulich@suse.com> wrote:
> >>>>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
> >>>>+        case V4VOP_register_ring:
> >>>>+            {
> >>>>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
> >>>>+                    guest_handle_cast(arg1, v4v_ring_t);
> >>>>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
> >>>>+                    guest_handle_cast(arg2, xen_pfn_t);
> >>>>+                uint32_t npage = arg3;
> >>>>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> >>>>+                    goto out;
> >>>>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
> >>>>+                    goto out;
> >>>
> >>> Here and below - this isn't sufficient for compat guests, or else
> >>> you give them a way to point into the compat m2p table.
> >>>
> >>
> >> I'll probably switch to uint64_t for the v4v mfn list instead of using
> >> xen_pfn_t which
> >> are unsigned long. That way I can save the need for a compat wrapper.
> >
> > But that comment of yours doesn't address the problem I pointed
> > out.
> >
> 
> [Resent, CCing everyone this time]
> 
> I'm sorry, I don't really get what you mean them. I've tried to get
> all my struct
> layout such as all the offset for the field are the same for 64b and 32b, this
> way I thought I could get away with doing a compat wrapper.

Even if the args don't need translation, compat-mode guests have
different VA layouts and need different range checks (though I'm not
sure why these aren't automatically adjusted based on current).

AIUI you need to use compat_handle_okay() instead of guest_handle_okay()
to check the handles if is_pv_32on64_domain(current).

Tim.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-04 15:12           ` Tim Deegan
@ 2012-10-04 17:00             ` Jean Guyader
  2012-10-05  8:52               ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Jean Guyader @ 2012-10-04 17:00 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Jean Guyader, Jan Beulich, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2078 bytes --]

On 4 October 2012 16:12, Tim Deegan <tim@xen.org> wrote:
> At 16:03 +0100 on 04 Oct (1349366589), Jean Guyader wrote:
>> On 4 October 2012 13:11, Jan Beulich <JBeulich@suse.com> wrote:
>> >>>> On 04.10.12 at 14:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>> >> On 20 September 2012 13:20, Jan Beulich <JBeulich@suse.com> wrote:
>> >>>>>> On 20.09.12 at 13:42, Jean Guyader <jean.guyader@citrix.com> wrote:
>> >>>>+        case V4VOP_register_ring:
>> >>>>+            {
>> >>>>+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
>> >>>>+                    guest_handle_cast(arg1, v4v_ring_t);
>> >>>>+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
>> >>>>+                    guest_handle_cast(arg2, xen_pfn_t);
>> >>>>+                uint32_t npage = arg3;
>> >>>>+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>> >>>>+                    goto out;
>> >>>>+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>> >>>>+                    goto out;
>> >>>
>> >>> Here and below - this isn't sufficient for compat guests, or else
>> >>> you give them a way to point into the compat m2p table.
>> >>>
>> >>
>> >> I'll probably switch to uint64_t for the v4v mfn list instead of using
>> >> xen_pfn_t which
>> >> are unsigned long. That way I can save the need for a compat wrapper.
>> >
>> > But that comment of yours doesn't address the problem I pointed
>> > out.
>> >
>>
>> [Resent, CCing everyone this time]
>>
>> I'm sorry, I don't really get what you mean them. I've tried to get
>> all my struct
>> layout such as all the offset for the field are the same for 64b and 32b, this
>> way I thought I could get away with doing a compat wrapper.
>
> Even if the args don't need translation, compat-mode guests have
> different VA layouts and need different range checks (though I'm not
> sure why these aren't automatically adjusted based on current).
>
> AIUI you need to use compat_handle_okay() instead of guest_handle_okay()
> to check the handles if is_pv_32on64_domain(current).
>

How about something like that?

Jean

[-- Attachment #2: guest_handle_okay_maybe_compat.patch --]
[-- Type: application/octet-stream, Size: 1085 bytes --]

diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index e3ac1d6..596d8a0 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -104,9 +104,12 @@
  * Pre-validate a guest handle.
  * Allows use of faster __copy_* functions.
  */
-#define guest_handle_okay(hnd, nr)                      \
-    (paging_mode_external(current->domain) ||           \
-     array_access_ok((hnd).p, (nr), sizeof(*(hnd).p)))
+#define guest_handle_okay(hnd, nr)                                      \
+    (is_pv_32on64_domain(current->domain) &&                            \
+     compat_array_access_ok((hnd).p, (nr), sizeof(*(hnd).p))) ||        \
+    (!is_pv_32on64_domain(current->domain) &&                           \
+     (paging_mode_external(current->domain) ||                          \
+      array_access_ok((hnd).p, (nr), sizeof(*(hnd).p))))
 #define guest_handle_subrange_okay(hnd, first, last)    \
     (paging_mode_external(current->domain) ||           \
      array_access_ok((hnd).p + (first),                 \

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-04 17:00             ` Jean Guyader
@ 2012-10-05  8:52               ` Jan Beulich
  2012-10-05  9:07                 ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2012-10-05  8:52 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Tim Deegan, Jean Guyader, xen-devel

>>> On 04.10.12 at 19:00, Jean Guyader <jean.guyader@gmail.com> wrote:
> On 4 October 2012 16:12, Tim Deegan <tim@xen.org> wrote:
>> AIUI you need to use compat_handle_okay() instead of guest_handle_okay()
>> to check the handles if is_pv_32on64_domain(current).
>>
> 
> How about something like that?

Could be done, but not by modifying guest_handle_okay() (which
penalizes all its users), nor by (incorrectly) open-coding
compat_handle_okay(). And neither should any such implementation
use is_pv_32on64_domain() twice - use the conditional operator
instead (that way you also avoid evaluating arguments twice).

So you could either introduce e.g. any_guest_handle_okay(), or
switch all current users of guest_handle_okay() to, say,
native_handle_okay() (perhaps with the exception of those where
the compat mode wrapper source files #define guest_handle_okay()
to compat_handle_okay(), which could then be dropped).

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-05  8:52               ` Jan Beulich
@ 2012-10-05  9:07                 ` Jan Beulich
  2012-10-08 10:24                   ` Jean Guyader
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2012-10-05  9:07 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Tim Deegan, Jean Guyader, xen-devel

>>> On 05.10.12 at 10:52, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 04.10.12 at 19:00, Jean Guyader <jean.guyader@gmail.com> wrote:
>> On 4 October 2012 16:12, Tim Deegan <tim@xen.org> wrote:
>>> AIUI you need to use compat_handle_okay() instead of guest_handle_okay()
>>> to check the handles if is_pv_32on64_domain(current).
>>>
>> 
>> How about something like that?
> 
> Could be done, but not by modifying guest_handle_okay() (which
> penalizes all its users), nor by (incorrectly) open-coding
> compat_handle_okay(). And neither should any such implementation
> use is_pv_32on64_domain() twice - use the conditional operator
> instead (that way you also avoid evaluating arguments twice).
> 
> So you could either introduce e.g. any_guest_handle_okay(), or
> switch all current users of guest_handle_okay() to, say,
> native_handle_okay() (perhaps with the exception of those where
> the compat mode wrapper source files #define guest_handle_okay()
> to compat_handle_okay(), which could then be dropped).

Actually, after some more recalling of how all this was put together,
you can use the existing guest_handle_okay() on anything that
legitimately is a XEN_GUEST_HANDLE(). In particular, direct
hypercall arguments can be expressed that way since they get
properly zero-extended from 32 to 64 bits on the hypercall entry
path. The things needing extra consideration are only handles
embedded in structures - you need to either translate these
handles, or validate that their upper halves are zero.

That's also one of the reasons why I didn't make the guest
memory accessor/validation macros transparently handle both
cases.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-10-05  9:07                 ` Jan Beulich
@ 2012-10-08 10:24                   ` Jean Guyader
  0 siblings, 0 replies; 21+ messages in thread
From: Jean Guyader @ 2012-10-08 10:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Jean Guyader, xen-devel

On 5 October 2012 10:07, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 05.10.12 at 10:52, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>> On 04.10.12 at 19:00, Jean Guyader <jean.guyader@gmail.com> wrote:
>>> On 4 October 2012 16:12, Tim Deegan <tim@xen.org> wrote:
>>>> AIUI you need to use compat_handle_okay() instead of guest_handle_okay()
>>>> to check the handles if is_pv_32on64_domain(current).
>>>>
>>>
>>> How about something like that?
>>
>> Could be done, but not by modifying guest_handle_okay() (which
>> penalizes all its users), nor by (incorrectly) open-coding
>> compat_handle_okay(). And neither should any such implementation
>> use is_pv_32on64_domain() twice - use the conditional operator
>> instead (that way you also avoid evaluating arguments twice).
>>
>> So you could either introduce e.g. any_guest_handle_okay(), or
>> switch all current users of guest_handle_okay() to, say,
>> native_handle_okay() (perhaps with the exception of those where
>> the compat mode wrapper source files #define guest_handle_okay()
>> to compat_handle_okay(), which could then be dropped).
>
> Actually, after some more recalling of how all this was put together,
> you can use the existing guest_handle_okay() on anything that
> legitimately is a XEN_GUEST_HANDLE(). In particular, direct
> hypercall arguments can be expressed that way since they get
> properly zero-extended from 32 to 64 bits on the hypercall entry
> path. The things needing extra consideration are only handles
> embedded in structures - you need to either translate these
> handles, or validate that their upper halves are zero.
>
> That's also one of the reasons why I didn't make the guest
> memory accessor/validation macros transparently handle both
> cases.
>

Ok, thanks for the background info.

In my hypercall I only use direct hypercall argument handle.
The other handles that I use have been created with
guest_handle_add_offset from the original argument handle.

Considering this, I think I might be ok.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/4] xen: Add V4V implementation
  2012-09-20  9:47 [PATCH 0/4] Add V4V to Xen v5 Jean Guyader
@ 2012-09-20  9:47 ` Jean Guyader
  0 siblings, 0 replies; 21+ messages in thread
From: Jean Guyader @ 2012-09-20  9:47 UTC (permalink / raw)
  To: xen-devel; +Cc: tim, Jean Guyader, JBeulich

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]


Setup of v4v domains a domain gets created and cleanup
when a domain die. Wire up the v4v hypercall.

Include v4v internal and public headers.

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/arch/x86/hvm/hvm.c             |    6 +-
 xen/arch/x86/x86_64/compat/entry.S |    2 +
 xen/arch/x86/x86_64/entry.S        |    2 +
 xen/common/Makefile                |    1 +
 xen/common/domain.c                |   13 +-
 xen/common/v4v.c                   | 1905 ++++++++++++++++++++++++++++++++++++
 xen/include/public/v4v.h           |  308 ++++++
 xen/include/public/xen.h           |    2 +-
 xen/include/xen/sched.h            |    4 +
 xen/include/xen/v4v.h              |  133 +++
 10 files changed, 2372 insertions(+), 4 deletions(-)
 create mode 100644 xen/common/v4v.c
 create mode 100644 xen/include/public/v4v.h
 create mode 100644 xen/include/xen/v4v.h


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-xen-Add-V4V-implementation.patch --]
[-- Type: text/x-patch; name="0004-xen-Add-V4V-implementation.patch", Size: 69141 bytes --]

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d69e419..b685535 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3183,7 +3183,8 @@ static hvm_hypercall_t *const hvm_hypercall64_table[NR_hypercalls] = {
     HYPERCALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #define COMPAT_CALL(x)                                        \
@@ -3200,7 +3201,8 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = {
     COMPAT_CALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 int hvm_do_hypercall(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 2f606ab..f518a03 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -414,6 +414,7 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -462,6 +463,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 5 /* do_v4v_op		    */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 8156827..5f4e7bf 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -707,6 +707,7 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -755,6 +756,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 5 /* do_v4v_op		*/
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/Makefile b/xen/common/Makefile
index c1c100f..ba63cec 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -45,6 +45,7 @@ obj-y += tmem_xen.o
 obj-y += radix-tree.o
 obj-y += rbtree.o
 obj-y += lzo.o
+obj-y += v4v.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo,$(n).init.o)
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index a1aa05e..94283e3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -195,7 +195,8 @@ struct domain *domain_create(
 {
     struct domain *d, **pd;
     enum { INIT_xsm = 1u<<0, INIT_watchdog = 1u<<1, INIT_rangeset = 1u<<2,
-           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5 };
+           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5,
+           INIT_v4v = 1u<<6 };
     int err, init_status = 0;
     int poolid = CPUPOOLID_NONE;
 
@@ -307,6 +308,13 @@ struct domain *domain_create(
         spin_unlock(&domlist_update_lock);
     }
 
+    if ( !is_idle_domain(d) )
+    {
+        if ( v4v_init(d) != 0 )
+            goto fail;
+        init_status |= INIT_v4v;
+    }
+
     return d;
 
  fail:
@@ -315,6 +323,8 @@ struct domain *domain_create(
     xfree(d->mem_event);
     if ( init_status & INIT_arch )
         arch_domain_destroy(d);
+    if ( init_status & INIT_v4v )
+	v4v_destroy(d);
     if ( init_status & INIT_gnttab )
         grant_table_destroy(d);
     if ( init_status & INIT_evtchn )
@@ -468,6 +478,7 @@ int domain_kill(struct domain *d)
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
         spin_barrier(&d->domain_lock);
+        v4v_destroy(d);
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
diff --git a/xen/common/v4v.c b/xen/common/v4v.c
new file mode 100644
index 0000000..afb9267
--- /dev/null
+++ b/xen/common/v4v.c
@@ -0,0 +1,1905 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/config.h>
+#include <xen/mm.h>
+#include <xen/compat.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/v4v.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <xen/keyhandler.h>
+#include <asm/types.h>
+
+#ifdef V4V_DEBUG
+#define v4v_dprintk(format, args...)            \
+    do {                                        \
+        printk("%s:%d " format,                 \
+               __FILE__, __LINE__, ## args );   \
+    } while ( 1 == 0 )
+#else
+#define v4v_dprintk(format, ... ) (void)0
+#endif
+
+/*
+ * Messages on the ring are padded to 128 bits
+ * Len here refers to the exact length of the data not including the
+ * 128 bit header. The message uses
+ * ((len +0xf) & ~0xf) + sizeof(v4v_ring_message_header) bytes
+ */
+#define V4V_ROUNDUP(a) (((a) +0xf ) & ~0xf)
+
+DEFINE_XEN_GUEST_HANDLE(uint8_t);
+static struct v4v_ring_info *v4v_ring_find_info(struct domain *d,
+                                                v4v_ring_id_t *id);
+
+static struct v4v_ring_info *v4v_ring_find_info_by_addr(struct domain *d,
+                                                        struct v4v_addr *a,
+                                                        domid_t p);
+
+typedef struct internal_v4v_iov
+{
+    XEN_GUEST_HANDLE(v4v_iov_t) guest_iov;
+    v4v_iov_t                   *xen_iov;
+} internal_v4v_iov_t;
+
+struct list_head viprules = LIST_HEAD_INIT(viprules);
+
+/*
+ * locks
+ */
+
+/*
+ * locking is organized as follows:
+ *
+ * the global lock v4v_lock: L1 protects the v4v elements
+ * of all struct domain *d in the system, it does not
+ * protect any of the elements of d->v4v, just their
+ * addresses. By extension since the destruction of
+ * a domain with a non-NULL d->v4v will need to free
+ * the d->v4v pointer, holding this lock gauruntees
+ * that no domains pointers in which v4v is interested
+ * become invalid whilst this lock is held.
+ */
+
+static DEFINE_RWLOCK(v4v_lock); /* L1 */
+
+/*
+ * the lock d->v4v->lock: L2:  Read on protects the hash table and
+ * the elements in the hash_table d->v4v->ring_hash, and
+ * the node and id fields in struct v4v_ring_info in the
+ * hash table. Write on L2 protects all of the elements of
+ * struct v4v_ring_info. To take L2 you must already have R(L1)
+ * W(L1) implies W(L2) and L3
+ *
+ * the lock v4v_ring_info *ringinfo; ringinfo->lock: L3:
+ * protects len,tx_ptr the guest ring, the
+ * guest ring_data and the pending list. To take L3 you must
+ * already have R(L2). W(L2) implies L3
+ */
+
+/*
+ * lock to protect the filtering rules list: viprules
+ *
+ * The write lock is held for viptables_del and viptables_add
+ * The read lock is held for viptable_list
+ */
+static DEFINE_RWLOCK(viprules_lock);
+
+
+/*
+ * Debugs
+ */
+
+#ifdef V4V_DEBUG
+static void __attribute__((unused))
+v4v_hexdump(void *_p, int len)
+{
+    uint8_t *buf = (uint8_t *)_p;
+    int i, j;
+
+    for ( i = 0; i < len; i += 16 )
+    {
+        printk(KERN_ERR "%p:", &buf[i]);
+        for ( j = 0; j < 16; ++j )
+        {
+            int k = i + j;
+            if ( k < len )
+                printk(" %02x", buf[k]);
+            else
+                printk("   ");
+        }
+        printk(" ");
+
+        for ( j = 0; j < 16; ++j )
+        {
+            int k = i + j;
+            if ( k < len )
+                printk("%c", ((buf[k] > 32) && (buf[k] < 127)) ? buf[k] : '.');
+            else
+                printk(" ");
+        }
+        printk("\n");
+    }
+}
+#endif
+
+
+/*
+ * Event channel
+ */
+
+static void
+v4v_signal_domain(struct domain *d)
+{
+    v4v_dprintk("send guest VIRQ_V4V domid:%d\n", d->domain_id);
+
+    evtchn_send(d, d->v4v->evtchn_port);
+}
+
+static void
+v4v_signal_domid(domid_t id)
+{
+    struct domain *d = get_domain_by_id(id);
+    if ( !d )
+        return;
+    v4v_signal_domain(d);
+    put_domain(d);
+}
+
+
+/*
+ * ring buffer
+ */
+
+static void
+v4v_ring_unmap(struct v4v_ring_info *ring_info)
+{
+    int i;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    for ( i = 0; i < ring_info->npage; ++i )
+    {
+        if ( !ring_info->mfn_mapping[i] )
+            continue;
+        v4v_dprintk("unmapping page %p from %p\n",
+                    (void*)mfn_x(ring_info->mfns[i]),
+                    ring_info->mfn_mapping[i]);
+
+        unmap_domain_page(ring_info->mfn_mapping[i]);
+        ring_info->mfn_mapping[i] = NULL;
+    }
+}
+
+static uint8_t *
+v4v_ring_map_page(struct v4v_ring_info *ring_info, int i)
+{
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( i >= ring_info->npage )
+        return NULL;
+    if ( ring_info->mfn_mapping[i] )
+        return ring_info->mfn_mapping[i];
+    ring_info->mfn_mapping[i] = map_domain_page(mfn_x(ring_info->mfns[i]));
+
+    v4v_dprintk("mapping page %p to %p\n",
+                (void *)mfn_x(ring_info->mfns[i]),
+                ring_info->mfn_mapping[i]);
+    return ring_info->mfn_mapping[i];
+}
+
+static int
+v4v_memcpy_from_guest_ring(void *_dst, struct v4v_ring_info *ring_info,
+                           uint32_t offset, uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *src;
+    uint8_t *dst = _dst;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= PAGE_SIZE - 1;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        src = v4v_ring_map_page(ring_info, page);
+
+        if ( !src )
+            return -EFAULT;
+
+        v4v_dprintk("memcpy(%p,%p+%d,%d)\n",
+                    dst, src, offset,
+                    (int)(PAGE_SIZE - offset));
+        memcpy(dst, src + offset, PAGE_SIZE - offset);
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        dst += PAGE_SIZE - offset;
+        offset = 0;
+    }
+
+    src = v4v_ring_map_page(ring_info, page);
+    if ( !src )
+        return -EFAULT;
+
+    v4v_dprintk("memcpy(%p,%p+%d,%d)\n", dst, src, offset, len);
+    memcpy(dst, src + offset, len);
+
+    return 0;
+}
+
+static int
+v4v_update_tx_ptr(struct v4v_ring_info *ring_info, uint32_t tx_ptr)
+{
+    uint8_t *dst = v4v_ring_map_page(ring_info, 0);
+    uint32_t *p = (uint32_t *)(dst + offsetof(v4v_ring_t, tx_ptr));
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( !dst )
+        return -EFAULT;
+    write_atomic(p, tx_ptr);
+    mb();
+    return 0;
+}
+
+static int
+v4v_copy_from_guest_maybe(void *dst, void *src,
+                          XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                          uint32_t len)
+{
+    int rc = 0;
+
+    if ( src )
+        memcpy(dst, src, len);
+    else
+        rc = copy_from_guest(dst, src_hnd, len);
+    return rc;
+}
+
+static int
+v4v_memcpy_to_guest_ring(struct v4v_ring_info *ring_info,
+                         uint32_t offset,
+                         void *src,
+                         XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                         uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *dst;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= PAGE_SIZE - 1;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        dst = v4v_ring_map_page(ring_info, page);
+        if ( !dst )
+            return -EFAULT;
+
+        if ( v4v_copy_from_guest_maybe(dst + offset, src, src_hnd,
+                                       PAGE_SIZE - offset) )
+            return -EFAULT;
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        if ( src )
+            src += (PAGE_SIZE - offset);
+        else
+            guest_handle_add_offset(src_hnd, PAGE_SIZE - offset);
+        offset = 0;
+    }
+
+    dst = v4v_ring_map_page(ring_info, page);
+    if ( !dst )
+        return -EFAULT;
+
+    if ( v4v_copy_from_guest_maybe(dst + offset, src, src_hnd, len) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static int
+v4v_ringbuf_get_rx_ptr(struct domain *d, struct v4v_ring_info *ring_info,
+                        uint32_t * rx_ptr)
+{
+    v4v_ring_t *ringp;
+
+    if ( ring_info->npage == 0 )
+        return -1;
+
+    ringp = map_domain_page(mfn_x(ring_info->mfns[0]));
+
+    v4v_dprintk("v4v_ringbuf_payload_space: mapped %p to %p\n",
+                (void *)mfn_x(ring_info->mfns[0]), ringp);
+    if ( !ringp )
+        return -1;
+
+    write_atomic(rx_ptr, ringp->rx_ptr);
+    mb();
+
+    unmap_domain_page(mfn_x(ring_info->mfns[0]));
+    return 0;
+}
+
+uint32_t
+v4v_ringbuf_payload_space(struct domain * d, struct v4v_ring_info * ring_info)
+{
+    v4v_ring_t ring;
+    int32_t ret;
+
+    ring.tx_ptr = ring_info->tx_ptr;
+    ring.len = ring_info->len;
+
+    if ( v4v_ringbuf_get_rx_ptr(d, ring_info, &ring.rx_ptr) )
+        return 0;
+
+    v4v_dprintk("v4v_ringbuf_payload_space:tx_ptr=%d rx_ptr=%d\n",
+                (int)ring.tx_ptr, (int)ring.rx_ptr);
+    if ( ring.rx_ptr == ring.tx_ptr )
+        return ring.len - sizeof (struct v4v_ring_message_header);
+
+    ret = ring.rx_ptr - ring.tx_ptr;
+    if ( ret < 0 )
+        ret += ring.len;
+
+    ret -= sizeof (struct v4v_ring_message_header);
+    ret -= V4V_ROUNDUP(1);
+
+    return (ret < 0) ? 0 : ret;
+}
+
+static unsigned long
+v4v_iov_copy(v4v_iov_t *iov, internal_v4v_iov_t *iovs)
+{
+    if (iovs->xen_iov == NULL)
+    {
+        return copy_from_guest(iov, iovs->guest_iov, 1);
+    }
+    else
+    {
+        *iov = *(iovs->xen_iov);
+        return 0;
+    }
+}
+
+static void
+v4v_iov_add_offset(internal_v4v_iov_t *iovs, int offset)
+{
+    if (iovs->xen_iov == NULL)
+        guest_handle_add_offset(iovs->guest_iov, offset);
+    else
+        iovs->xen_iov += offset;
+}
+
+static long
+v4v_iov_count(internal_v4v_iov_t iovs, int niov)
+{
+    v4v_iov_t iov;
+    size_t ret = 0;
+
+    while ( niov-- )
+    {
+        if ( v4v_iov_copy(&iov, &iovs) )
+            return -EFAULT;
+
+        /* message bigger than 2G can't be sent */
+        if (ret > 2L * 1024 * 1024 * 1024)
+            return -EMSGSIZE;
+
+        ret += iov.iov_len;
+        v4v_iov_add_offset(&iovs, 1);
+    }
+
+    return ret;
+}
+
+static long
+v4v_ringbuf_insertv(struct domain *d,
+                    struct v4v_ring_info *ring_info,
+                    v4v_ring_id_t *src_id, uint32_t proto,
+                    internal_v4v_iov_t iovs, uint32_t niov,
+                    size_t len)
+{
+    v4v_ring_t ring;
+    struct v4v_ring_message_header mh = { 0 };
+    int32_t sp;
+    long happy_ret;
+    int32_t ret = 0;
+    XEN_GUEST_HANDLE(uint8_t) empty_hnd = { 0 };
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    happy_ret = len;
+
+    if ( (V4V_ROUNDUP(len) + sizeof (struct v4v_ring_message_header) ) >=
+            ring_info->len)
+        return -EMSGSIZE;
+
+    do
+    {
+        if ( (ret = v4v_memcpy_from_guest_ring(&ring, ring_info, 0,
+                                               sizeof (ring))) )
+            break;
+
+        ring.tx_ptr = ring_info->tx_ptr;
+        ring.len = ring_info->len;
+
+        v4v_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d ring_info->tx_ptr=%d\n",
+                    ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
+
+        if ( ring.rx_ptr == ring.tx_ptr )
+            sp = ring_info->len;
+        else
+        {
+            sp = ring.rx_ptr - ring.tx_ptr;
+            if ( sp < 0 )
+                sp += ring.len;
+        }
+
+        if ( (V4V_ROUNDUP(len) + sizeof (struct v4v_ring_message_header)) >= sp )
+        {
+            v4v_dprintk("EAGAIN\n");
+            ret = -EAGAIN;
+            break;
+        }
+
+        mh.len = len + sizeof (struct v4v_ring_message_header);
+        mh.source = src_id->addr;
+        mh.message_type = proto;
+
+        if ( (ret = v4v_memcpy_to_guest_ring(ring_info,
+                                             ring.tx_ptr + sizeof (v4v_ring_t),
+                                             &mh, empty_hnd,
+                                             sizeof (mh))) )
+            break;
+
+        ring.tx_ptr += sizeof (mh);
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        while ( niov-- )
+        {
+            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
+            v4v_iov_t iov;
+
+            if ( v4v_iov_copy(&iov, &iovs) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            buf_hnd.p = (uint8_t *)iov.iov_base; //FIXME
+            len = iov.iov_len;
+
+            if ( unlikely(!guest_handle_okay(buf_hnd, len)) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            sp = ring.len - ring.tx_ptr;
+
+            if ( len > sp )
+            {
+                ret = v4v_memcpy_to_guest_ring(ring_info,
+                        ring.tx_ptr + sizeof (v4v_ring_t),
+                        NULL, buf_hnd, sp);
+                if ( ret )
+                    break;
+
+                ring.tx_ptr = 0;
+                len -= sp;
+                guest_handle_add_offset(buf_hnd, sp);
+            }
+
+            ret = v4v_memcpy_to_guest_ring(ring_info,
+                    ring.tx_ptr + sizeof (v4v_ring_t),
+                    NULL, buf_hnd, len);
+            if ( ret )
+                break;
+
+            ring.tx_ptr += len;
+
+            if ( ring.tx_ptr == ring_info->len )
+                ring.tx_ptr = 0;
+
+            v4v_iov_add_offset(&iovs, 1);
+        }
+        if ( ret )
+            break;
+
+        ring.tx_ptr = V4V_ROUNDUP(ring.tx_ptr);
+
+        if ( ring.tx_ptr >= ring_info->len )
+            ring.tx_ptr -= ring_info->len;
+
+        mb();
+        ring_info->tx_ptr = ring.tx_ptr;
+        if ( (ret = v4v_update_tx_ptr(ring_info, ring.tx_ptr)) )
+            break;
+    }
+    while ( 0 );
+
+    v4v_ring_unmap(ring_info);
+
+    return ret ? ret : happy_ret;
+}
+
+
+
+/* pending */
+static void
+v4v_pending_remove_ent(struct v4v_pending_ent *ent)
+{
+    hlist_del(&ent->node);
+    xfree(ent);
+}
+
+static void
+v4v_pending_remove_all(struct v4v_ring_info *info)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    ASSERT(spin_is_locked(&info->lock));
+    hlist_for_each_entry_safe(pending_ent, node, next, &info->pending,
+            node) v4v_pending_remove_ent(pending_ent);
+}
+
+static void
+v4v_pending_notify(struct domain *caller_d, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    ASSERT(rw_is_locked(&v4v_lock));
+
+    hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node)
+    {
+        hlist_del(&pending_ent->node);
+        v4v_signal_domid(pending_ent->id);
+        xfree(pending_ent);
+    }
+
+}
+
+static void
+v4v_pending_find(struct domain *d, struct v4v_ring_info *ring_info,
+                 uint32_t payload_space, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( payload_space >= ent->len )
+        {
+            hlist_del(&ent->node);
+            hlist_add_head(&ent->node, to_notify);
+        }
+    }
+    spin_unlock(&ring_info->lock);
+}
+
+/*caller must have L3 */
+static int
+v4v_pending_queue(struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct v4v_pending_ent *ent = xmalloc(struct v4v_pending_ent);
+
+    if ( !ent )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    ent->len = len;
+    ent->id = src_id;
+
+    hlist_add_head(&ent->node, &ring_info->pending);
+
+    return 0;
+}
+
+/* L3 */
+static int
+v4v_pending_requeue(struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct hlist_node *node;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry(ent, node, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id )
+        {
+            if ( ent->len < len )
+                ent->len = len;
+            return 0;
+        }
+    }
+
+    return v4v_pending_queue(ring_info, src_id, len);
+}
+
+
+/* L3 */
+static void
+v4v_pending_cancel(struct v4v_ring_info *ring_info, domid_t src_id)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id)
+        {
+            hlist_del(&ent->node);
+            xfree(ent);
+        }
+    }
+}
+
+/*
+ * ring data
+ */
+
+/*Caller should hold R(L1)*/
+static int
+v4v_fill_ring_data(struct domain *src_d,
+                   XEN_GUEST_HANDLE(v4v_ring_data_ent_t) data_ent_hnd)
+{
+    v4v_ring_data_ent_t ent;
+    struct domain *dst_d;
+    struct v4v_ring_info *ring_info;
+
+    if ( copy_from_guest(&ent, data_ent_hnd, 1) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+
+    v4v_dprintk("v4v_fill_ring_data: ent.ring.domain=%d,ent.ring.port=%u\n",
+                (int)ent.ring.domain, (int)ent.ring.port);
+
+    ent.flags = 0;
+
+    dst_d = get_domain_by_id(ent.ring.domain);
+
+    if ( dst_d && dst_d->v4v )
+    {
+        read_lock(&dst_d->v4v->lock);
+        ring_info = v4v_ring_find_info_by_addr(dst_d, &ent.ring,
+                                               src_d->domain_id);
+
+        if ( ring_info )
+        {
+            uint32_t space_avail;
+
+            ent.flags |= V4V_RING_DATA_F_EXISTS;
+            ent.max_message_size =
+                ring_info->len - sizeof (struct v4v_ring_message_header) -
+                V4V_ROUNDUP(1);
+            spin_lock(&ring_info->lock);
+
+            space_avail = v4v_ringbuf_payload_space(dst_d, ring_info);
+
+            if ( space_avail >= ent.space_required )
+            {
+                v4v_pending_cancel(ring_info, src_d->domain_id);
+                ent.flags |= V4V_RING_DATA_F_SUFFICIENT;
+            }
+            else
+            {
+                v4v_pending_requeue(ring_info, src_d->domain_id,
+                        ent.space_required);
+                ent.flags |= V4V_RING_DATA_F_PENDING;
+            }
+
+            spin_unlock(&ring_info->lock);
+
+            if ( space_avail == ent.max_message_size )
+                ent.flags |= V4V_RING_DATA_F_EMPTY;
+
+        }
+        read_unlock(&dst_d->v4v->lock);
+    }
+
+    if ( dst_d )
+        put_domain(dst_d);
+
+    if ( copy_field_to_guest(data_ent_hnd, &ent, flags) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+    return 0;
+}
+
+/*Called should hold no more than R(L1) */
+static int
+v4v_fill_ring_datas(struct domain *d, int nent,
+                     XEN_GUEST_HANDLE(v4v_ring_data_ent_t) data_ent_hnd)
+{
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+    while ( !ret && nent-- )
+    {
+        ret = v4v_fill_ring_data(d, data_ent_hnd);
+        guest_handle_add_offset(data_ent_hnd, 1);
+    }
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+/*
+ * ring
+ */
+static int
+v4v_find_ring_mfns(struct domain *d, struct v4v_ring_info *ring_info,
+                   uint32_t npage, XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd)
+{
+    int i,j;
+    mfn_t *mfns;
+    uint8_t **mfn_mapping;
+    unsigned long mfn;
+    struct page_info *page;
+    int ret = 0;
+
+    if ( (npage << PAGE_SHIFT) < ring_info->len )
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    mfns = xmalloc_array(mfn_t, npage);
+    if ( !mfns )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    mfn_mapping = xmalloc_array(uint8_t *, npage);
+    if ( !mfn_mapping )
+    {
+        xfree(mfns);
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < npage; ++i )
+    {
+        unsigned long pfn;
+        p2m_type_t p2mt;
+
+        if ( copy_from_guest_offset(&pfn, pfn_hnd, i, 1) )
+        {
+            ret = -EFAULT;
+            v4v_dprintk("EFAULT\n");
+            break;
+        }
+
+        mfn = mfn_x(get_gfn(d, pfn, &p2mt));
+        if ( !mfn_valid(mfn) )
+        {
+            printk(KERN_ERR "v4v domain %d passed invalid mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        page = mfn_to_page(mfn);
+        if ( !get_page_and_type(page, d, PGT_writable_page) )
+        {
+            printk(KERN_ERR "v4v domain %d passed wrong type mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        mfns[i] = _mfn(mfn);
+        v4v_dprintk("v4v_find_ring_mfns: %d: %lx -> %lx\n",
+                    i, (unsigned long)pfn, (unsigned long)mfn_x(mfns[i]));
+        mfn_mapping[i] = NULL;
+        put_gfn(d, pfn);
+    }
+
+    if ( !ret )
+    {
+        ring_info->npage = npage;
+        ring_info->mfns = mfns;
+        ring_info->mfn_mapping = mfn_mapping;
+    }
+    else
+    {
+        j = i;
+        for ( i = 0; i < j; ++i )
+            if ( mfn_x(mfns[i]) != 0 )
+                put_page_and_type(mfn_to_page(mfn_x(mfns[i])));
+        xfree(mfn_mapping);
+        xfree(mfns);
+    }
+    return ret;
+}
+
+
+static struct v4v_ring_info *
+v4v_ring_find_info(struct domain *d, v4v_ring_id_t *id)
+{
+    uint16_t hash;
+    struct hlist_node *node;
+    struct v4v_ring_info *ring_info;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    hash = v4v_hash_fn(id);
+
+    v4v_dprintk("ring_find_info: d->v4v=%p, d->v4v->ring_hash[%d]=%p id=%p\n",
+                d->v4v, (int)hash, d->v4v->ring_hash[hash].first, id);
+    v4v_dprintk("ring_find_info: id.addr.port=%d id.addr.domain=%d id.addr.partner=%d\n",
+                id->addr.port, id->addr.domain, id->partner);
+
+    hlist_for_each_entry(ring_info, node, &d->v4v->ring_hash[hash], node)
+    {
+        v4v_ring_id_t *cmpid = &ring_info->id;
+
+        if ( cmpid->addr.port == id->addr.port &&
+             cmpid->addr.domain == id->addr.domain &&
+             cmpid->partner == id->partner)
+        {
+            v4v_dprintk("ring_find_info: ring_info=%p\n", ring_info);
+            return ring_info;
+        }
+    }
+    v4v_dprintk("ring_find_info: no ring_info found\n");
+    return NULL;
+}
+
+static struct v4v_ring_info *
+v4v_ring_find_info_by_addr(struct domain *d, struct v4v_addr *a, domid_t p)
+{
+    v4v_ring_id_t id;
+    struct v4v_ring_info *ret;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    if ( !a )
+        return NULL;
+
+    id.addr.port = a->port;
+    id.addr.domain = d->domain_id;
+    id.partner = p;
+
+    ret = v4v_ring_find_info(d, &id);
+    if ( ret )
+        return ret;
+
+    id.partner = V4V_DOMID_ANY;
+
+    return v4v_ring_find_info(d, &id);
+}
+
+static void v4v_ring_remove_mfns(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    int i;
+
+    ASSERT(rw_is_write_locked(&d->v4v->lock));
+
+    if ( ring_info->mfns )
+    {
+        for ( i = 0; i < ring_info->npage; ++i )
+            if ( mfn_x(ring_info->mfns[i]) != 0 )
+                put_page_and_type(mfn_to_page(mfn_x(ring_info->mfns[i])));
+        xfree(ring_info->mfns);
+    }
+    if ( ring_info->mfn_mapping )
+        xfree(ring_info->mfn_mapping);
+    ring_info->mfns = NULL;
+
+}
+
+static void
+v4v_ring_remove_info(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    ASSERT(rw_is_write_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+
+    v4v_pending_remove_all(ring_info);
+    hlist_del(&ring_info->node);
+    v4v_ring_remove_mfns(d, ring_info);
+
+    spin_unlock(&ring_info->lock);
+
+    xfree(ring_info);
+}
+
+/* Call from guest to unpublish a ring */
+static long
+v4v_ring_remove(struct domain *d, XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest(&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                    ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+
+        write_lock(&d->v4v->lock);
+        ring_info = v4v_ring_find_info(d, &ring.id);
+
+        if ( ring_info )
+            v4v_ring_remove_info(d, ring_info);
+
+        write_unlock(&d->v4v->lock);
+
+        if ( !ring_info )
+        {
+            v4v_dprintk("ENOENT\n");
+            ret = -ENOENT;
+            break;
+        }
+
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+/* call from guest to publish a ring */
+static long
+v4v_ring_add(struct domain *d, XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd,
+             uint32_t npage, XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int need_to_insert = 0;
+    int ret = 0;
+
+    if ( (long)ring_hnd.p & (PAGE_SIZE - 1) )
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock(&v4v_lock);
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk(" !d->v4v, EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest(&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk(" copy_from_guest failed, EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( (ring.len <
+                    (sizeof (struct v4v_ring_message_header) + V4V_ROUNDUP(1) +
+                     V4V_ROUNDUP(1))) || (V4V_ROUNDUP(ring.len) != ring.len) )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+        if ( copy_field_to_guest(ring_hnd, &ring, id) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        /*
+         * no need for a lock yet, because only we know about this
+         * set the tx pointer if it looks bogus (we don't reset it
+         * because this might be a re-register after S4)
+         */
+        if ( (ring.tx_ptr >= ring.len)
+                || (V4V_ROUNDUP(ring.tx_ptr) != ring.tx_ptr) )
+        {
+            ring.tx_ptr = ring.rx_ptr;
+        }
+        copy_field_to_guest(ring_hnd, &ring, tx_ptr);
+
+        read_lock(&d->v4v->lock);
+        ring_info = v4v_ring_find_info(d, &ring.id);
+
+        if ( !ring_info )
+        {
+            read_unlock(&d->v4v->lock);
+            ring_info = xmalloc(struct v4v_ring_info);
+            if ( !ring_info )
+            {
+                v4v_dprintk("ENOMEM\n");
+                ret = -ENOMEM;
+                break;
+            }
+            need_to_insert++;
+            spin_lock_init(&ring_info->lock);
+            INIT_HLIST_HEAD(&ring_info->pending);
+            ring_info->mfns = NULL;
+
+        }
+        else
+        {
+            /*
+             * Ring info already existed.
+             */
+            printk(KERN_INFO "v4v: dom%d ring already registered\n",
+                    current->domain->domain_id);
+            ret = -EEXIST;
+            break;
+        }
+
+        spin_lock(&ring_info->lock);
+        ring_info->id = ring.id;
+        ring_info->len = ring.len;
+        ring_info->tx_ptr = ring.tx_ptr;
+        ring_info->ring = ring_hnd;
+        if ( ring_info->mfns )
+            xfree(ring_info->mfns);
+        ret = v4v_find_ring_mfns(d, ring_info, npage, pfn_hnd);
+        spin_unlock(&ring_info->lock);
+        if ( ret )
+            break;
+
+        if ( !need_to_insert )
+        {
+            read_unlock(&d->v4v->lock);
+        }
+        else
+        {
+            uint16_t hash = v4v_hash_fn(&ring.id);
+            write_lock(&d->v4v->lock);
+            hlist_add_head(&ring_info->node, &d->v4v->ring_hash[hash]);
+            write_unlock(&d->v4v->lock);
+        }
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+
+/*
+ * io
+ */
+
+static void
+v4v_notify_ring(struct domain *d, struct v4v_ring_info *ring_info,
+                struct hlist_head *to_notify)
+{
+    uint32_t space;
+
+    ASSERT(rw_is_locked(&v4v_lock));
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+    space = v4v_ringbuf_payload_space(d, ring_info);
+    spin_unlock(&ring_info->lock);
+
+    v4v_pending_find(d, ring_info, space, to_notify);
+}
+
+/*notify hypercall*/
+static long
+v4v_notify(struct domain *d,
+           XEN_GUEST_HANDLE(v4v_ring_data_t) ring_data_hnd)
+{
+    v4v_ring_data_t ring_data;
+    HLIST_HEAD(to_notify);
+    int i;
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+
+    if ( !d->v4v )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!d->v4v, ENODEV\n");
+        return -ENODEV;
+    }
+
+    read_lock(&d->v4v->lock);
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node, *next;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry_safe(ring_info, node,
+                next, &d->v4v->ring_hash[i],
+                node)
+        {
+            v4v_notify_ring(d, ring_info, &to_notify);
+        }
+    }
+    read_unlock(&d->v4v->lock);
+
+    if ( !hlist_empty(&to_notify) )
+        v4v_pending_notify(d, &to_notify);
+
+    do
+    {
+        if ( !guest_handle_is_null(ring_data_hnd) )
+        {
+            /* Quick sanity check on ring_data_hnd */
+            if ( copy_field_from_guest(&ring_data, ring_data_hnd, magic) )
+            {
+                v4v_dprintk("copy_field_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            if ( ring_data.magic != V4V_RING_DATA_MAGIC )
+            {
+                v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring_data.magic, V4V_RING_MAGIC);
+                ret = -EINVAL;
+                break;
+            }
+
+            if ( copy_from_guest(&ring_data, ring_data_hnd, 1) )
+            {
+                v4v_dprintk("copy_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            {
+                XEN_GUEST_HANDLE(v4v_ring_data_ent_t) ring_data_ent_hnd;
+                ring_data_ent_hnd =
+                    guest_handle_for_field(ring_data_hnd, v4v_ring_data_ent_t, data[0]);
+                ret = v4v_fill_ring_datas(d, ring_data.nent, ring_data_ent_hnd);
+            }
+        }
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+
+    return ret;
+}
+
+#ifdef V4V_DEBUG
+void
+v4v_viptables_print_rule(struct v4v_viptables_rule_node *node)
+{
+    v4v_viptables_rule_t *rule;
+
+    if ( node == NULL )
+    {
+        printk("(null)\n");
+        return;
+    }
+
+    rule = &node->rule;
+
+    if ( rule->accept == 1 )
+        printk("ACCEPT");
+    else
+        printk("REJECT");
+
+    printk(" ");
+
+    if ( rule->src.domain == V4V_DOMID_ANY )
+        printk("*");
+    else
+        printk("%i", rule->src.domain);
+
+    printk(":");
+
+    if ( rule->src.port == -1 )
+        printk("*");
+    else
+        printk("%i", rule->src.port);
+
+    printk(" -> ");
+
+    if ( rule->dst.domain == V4V_DOMID_ANY )
+        printk("*");
+    else
+        printk("%i", rule->dst.domain);
+
+    printk(":");
+
+    if ( rule->dst.port == -1 )
+        printk("*");
+    else
+        printk("%i", rule->dst.port);
+
+    printk("\n");
+}
+#endif /* V4V_DEBUG */
+
+int
+v4v_viptables_add(struct domain *src_d,
+                  XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+                  int32_t position)
+{
+    struct v4v_viptables_rule_node* new = NULL;
+    struct list_head* tmp;
+
+    ASSERT(rw_is_write_locked(&viprules_lock));
+
+    /* First rule is n.1 */
+    position--;
+
+    new = xmalloc(struct v4v_viptables_rule_node);
+    if ( new == NULL )
+        return -ENOMEM;
+
+    if ( copy_from_guest(&new->rule, rule, 1) )
+    {
+        xfree(new);
+        return -EFAULT;
+    }
+
+#ifdef V4V_DEBUG
+    printk(KERN_ERR "VIPTables: ");
+    v4v_viptables_print_rule(new);
+#endif /* V4V_DEBUG */
+
+    tmp = &viprules;
+    while ( position != 0 && tmp->next != &viprules )
+    {
+        tmp = tmp->next;
+        position--;
+    }
+    list_add(&new->list, tmp);
+
+    return 0;
+}
+
+int
+v4v_viptables_del(struct domain *src_d,
+                  XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd,
+                  int32_t position)
+{
+    struct list_head *tmp = NULL;
+    struct list_head *next = NULL;
+    struct v4v_viptables_rule_node *node;
+
+    ASSERT(rw_is_write_locked(&viprules_lock));
+
+    v4v_dprintk("v4v_viptables_del position:%d\n", position);
+
+    if ( position != -1 )
+    {
+        /* We want to delete the rule number <position> */
+        tmp = &viprules;
+        while ( position != 0 && tmp->next != &viprules )
+        {
+            tmp = tmp->next;
+            position--;
+        }
+    }
+    else if ( !guest_handle_is_null(rule_hnd) )
+    {
+        struct v4v_viptables_rule r;
+
+        if ( copy_from_guest(&r, rule_hnd, 1) )
+            return -EFAULT;
+
+        list_for_each(tmp, &viprules)
+        {
+            node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+
+            if ( (node->rule.src.domain == r.src.domain) &&
+                 (node->rule.src.port   == r.src.port)   &&
+                 (node->rule.dst.domain == r.dst.domain) &&
+                 (node->rule.dst.port   == r.dst.port))
+            {
+                position = 0;
+                break;
+            }
+        }
+    }
+    else
+    {
+        /* We want to flush the rules! */
+        printk(KERN_ERR "VIPTables: flushing rules\n");
+        list_for_each_safe(tmp, next, &viprules)
+        {
+            node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+            list_del(tmp);
+            xfree(node);
+        }
+    }
+
+    if ( position == 0 && tmp != &viprules )
+    {
+        node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+#ifdef V4V_DEBUG
+        printk(KERN_ERR "VIPTables: deleting rule: ");
+        v4v_viptables_print_rule(node);
+#endif /* V4V_DEBUG */
+        list_del(tmp);
+        xfree(node);
+    }
+
+    return 0;
+}
+
+static size_t
+v4v_viptables_list(struct domain *src_d,
+                   XEN_GUEST_HANDLE(v4v_viptables_list_t) list_hnd)
+{
+    struct list_head *ptr;
+    struct v4v_viptables_rule_node *node;
+    struct v4v_viptables_list rules_list;
+    uint32_t nbrules;
+    XEN_GUEST_HANDLE(v4v_viptables_rule_t) guest_rules;
+
+    ASSERT(rw_is_locked(&viprules_lock));
+
+    memset(&rules_list, 0, sizeof (rules_list));
+    if ( copy_from_guest(&rules_list, list_hnd, 1) )
+        return -EFAULT;
+
+    ptr = viprules.next;
+    while ( rules_list.start_rule != 0 && ptr->next != &viprules )
+    {
+        ptr = ptr->next;
+        rules_list.start_rule--;
+    }
+
+    if ( rules_list.nb_rules == 0 )
+        return -EINVAL;
+
+    guest_rules = guest_handle_for_field(list_hnd, v4v_viptables_rule_t, rules[0]);
+
+    nbrules = 0;
+    while ( nbrules < rules_list.nb_rules && ptr != &viprules )
+    {
+        node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+        if ( !guest_handle_okay(guest_rules, 1) )
+            break;
+
+        if ( copy_to_guest(guest_rules, &node->rule, 1) )
+            break;
+
+        guest_handle_add_offset(guest_rules, 1);
+
+        nbrules++;
+        ptr = ptr->next;
+    }
+
+    rules_list.nb_rules = nbrules;
+    if ( copy_field_to_guest(list_hnd, &rules_list, nb_rules) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static size_t
+v4v_viptables_check(v4v_addr_t * src, v4v_addr_t * dst)
+{
+    struct list_head *ptr;
+    struct v4v_viptables_rule_node *node;
+    size_t ret = 0; /* Defaulting to ACCEPT */
+
+    read_lock(&viprules_lock);
+
+    list_for_each(ptr, &viprules)
+    {
+        node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+        if ( (node->rule.src.domain == V4V_DOMID_ANY ||
+              node->rule.src.domain == src->domain) &&
+             (node->rule.src.port == V4V_PORT_ANY ||
+              node->rule.src.port == src->port) &&
+             (node->rule.dst.domain == V4V_DOMID_ANY ||
+              node->rule.dst.domain == dst->domain) &&
+             (node->rule.dst.port == V4V_PORT_ANY ||
+              node->rule.dst.port == dst->port) )
+        {
+            ret = !node->rule.accept;
+            break;
+        }
+    }
+
+    read_unlock(&viprules_lock);
+    return ret;
+}
+
+/*
+ * Hypercall to do the send
+ */
+static long
+v4v_sendv(struct domain *src_d, v4v_addr_t * src_addr,
+          v4v_addr_t * dst_addr, uint32_t proto,
+          internal_v4v_iov_t iovs, size_t niov)
+{
+    struct domain *dst_d;
+    v4v_ring_id_t src_id;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    if ( !dst_addr )
+    {
+        v4v_dprintk("!dst_addr, EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock(&v4v_lock);
+    if ( !src_d->v4v )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!src_d->v4v, EINVAL\n");
+        return -EINVAL;
+    }
+
+    src_id.addr.port = src_addr->port;
+    src_id.addr.domain = src_d->domain_id;
+    src_id.partner = dst_addr->domain;
+
+    dst_d = get_domain_by_id(dst_addr->domain);
+    if ( !dst_d )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!dst_d, ECONNREFUSED\n");
+        return -ECONNREFUSED;
+    }
+
+    if ( v4v_viptables_check(src_addr, dst_addr) != 0 )
+    {
+        read_unlock(&v4v_lock);
+        gdprintk(XENLOG_WARNING,
+                 "V4V: VIPTables REJECTED %i:%i -> %i:%i\n",
+                 src_addr->domain, src_addr->port,
+                 dst_addr->domain, dst_addr->port);
+        return -ECONNREFUSED;
+    }
+
+    do
+    {
+        if ( !dst_d->v4v )
+        {
+            v4v_dprintk("dst_d->v4v, ECONNREFUSED\n");
+            ret = -ECONNREFUSED;
+            break;
+        }
+
+        read_lock(&dst_d->v4v->lock);
+        ring_info =
+            v4v_ring_find_info_by_addr(dst_d, dst_addr, src_addr->domain);
+
+        if ( !ring_info )
+        {
+            ret = -ECONNREFUSED;
+            v4v_dprintk(" !ring_info, ECONNREFUSED\n");
+        }
+        else
+        {
+            long len = v4v_iov_count(iovs, niov);
+
+            if ( len < 0 )
+            {
+                ret = len;
+                break;
+            }
+
+            spin_lock(&ring_info->lock);
+            ret =
+                v4v_ringbuf_insertv(dst_d, ring_info, &src_id, proto, iovs,
+                        niov, len);
+            if ( ret == -EAGAIN )
+            {
+                v4v_dprintk("v4v_ringbuf_insertv failed, EAGAIN\n");
+                /* Schedule a wake up on the event channel when space is there */
+                if ( v4v_pending_requeue(ring_info, src_d->domain_id, len) )
+                {
+                    v4v_dprintk("v4v_pending_requeue failed, ENOMEM\n");
+                    ret = -ENOMEM;
+                }
+            }
+            spin_unlock(&ring_info->lock);
+
+            if ( ret >= 0 )
+            {
+                v4v_signal_domain(dst_d);
+            }
+
+        }
+        read_unlock(&dst_d->v4v->lock);
+
+    }
+    while ( 0 );
+
+    put_domain(dst_d);
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+static void
+v4v_info(struct domain *d, v4v_info_t *info)
+{
+    read_lock(&d->v4v->lock);
+    info->ring_magic = V4V_RING_MAGIC;
+    info->data_magic = V4V_RING_DATA_MAGIC;
+    info->evtchn = d->v4v->evtchn_port;
+    read_unlock(&d->v4v->lock);
+}
+
+/*
+ * hypercall glue
+ */
+long
+do_v4v_op(int cmd, XEN_GUEST_HANDLE(void) arg1,
+          XEN_GUEST_HANDLE(void) arg2,
+          uint32_t arg3, uint32_t arg4)
+{
+    struct domain *d = current->domain;
+    long rc = -EFAULT;
+
+    v4v_dprintk("->do_v4v_op(%d,%p,%p,%d,%d)\n", cmd,
+                (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
+
+    domain_lock(d);
+    switch (cmd)
+    {
+        case V4VOP_register_ring:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
+                    guest_handle_cast(arg1, v4v_ring_t);
+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
+                    guest_handle_cast(arg2, xen_pfn_t);
+                uint32_t npage = arg3;
+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+                    goto out;
+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
+                    goto out;
+                rc = v4v_ring_add(d, ring_hnd, npage, pfn_hnd);
+                break;
+            }
+        case V4VOP_unregister_ring:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
+                    guest_handle_cast(arg1, v4v_ring_t);
+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+                    goto out;
+                rc = v4v_ring_remove(d, ring_hnd);
+                break;
+            }
+        case V4VOP_send:
+            {
+                uint32_t len = arg3;
+                uint32_t message_type = arg4;
+                v4v_iov_t iov;
+                internal_v4v_iov_t iovs;
+                XEN_GUEST_HANDLE(v4v_send_addr_t) addr_hnd =
+                    guest_handle_cast(arg1, v4v_send_addr_t);
+                v4v_send_addr_t addr;
+
+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest(&addr, addr_hnd, 1) )
+                    goto out;
+
+                iov.iov_base = (uint64_t)arg2.p; // FIXME
+                iov.iov_len = len;
+                iovs.xen_iov = &iov;
+                rc = v4v_sendv(d, &addr.src, &addr.dst, message_type, iovs, 1);
+                break;
+            }
+        case V4VOP_sendv:
+            {
+                internal_v4v_iov_t iovs;
+                uint32_t niov = arg3;
+                uint32_t message_type = arg4;
+                XEN_GUEST_HANDLE(v4v_send_addr_t) addr_hnd =
+                    guest_handle_cast(arg1, v4v_send_addr_t);
+                v4v_send_addr_t addr;
+
+                memset(&iovs, 0, sizeof (iovs));
+                iovs.guest_iov = guest_handle_cast(arg2, v4v_iov_t);
+
+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest(&addr, addr_hnd, 1) )
+                    goto out;
+
+                if ( unlikely(!guest_handle_okay(iovs.guest_iov, niov)) )
+                    goto out;
+
+                rc = v4v_sendv(d, &addr.src, &addr.dst, message_type, iovs, niov);
+                break;
+            }
+        case V4VOP_notify:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_data_t) ring_data_hnd =
+                    guest_handle_cast(arg1, v4v_ring_data_t);
+                rc = v4v_notify(d, ring_data_hnd);
+                break;
+            }
+        case V4VOP_viptables_add:
+            {
+                uint32_t position = arg3;
+                XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                write_lock(&viprules_lock);
+                rc = v4v_viptables_add(d, rule_hnd, position);
+                write_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_viptables_del:
+            {
+                uint32_t position = arg3;
+                XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                write_lock(&viprules_lock);
+                rc = v4v_viptables_del(d, rule_hnd, position);
+                write_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_viptables_list:
+            {
+                XEN_GUEST_HANDLE(v4v_viptables_list_t) rules_list_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_list_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                rc = -EFAULT;
+                if ( unlikely(!guest_handle_okay(rules_list_hnd, 1)) )
+                    goto out;
+
+                read_lock(&viprules_lock);
+                rc = v4v_viptables_list(d, rules_list_hnd);
+                read_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_info:
+            {
+                XEN_GUEST_HANDLE(v4v_info_t) info_hnd =
+                    guest_handle_cast(arg1, v4v_info_t);
+                v4v_info_t info;
+
+                if ( unlikely(!guest_handle_okay(info_hnd, 1)) )
+                    goto out;
+                v4v_info(d, &info);
+                if ( copy_to_guest(info_hnd, &info, 1) )
+                    goto out;
+                rc = 0;
+                break;
+            }
+        default:
+            rc = -ENOSYS;
+            break;
+    }
+out:
+    domain_unlock(d);
+    v4v_dprintk("<-do_v4v_op()=%d\n", (int)rc);
+    return rc;
+}
+
+/*
+ * init
+ */
+
+void
+v4v_destroy(struct domain *d)
+{
+    int i;
+
+    BUG_ON(!d->is_dying);
+    write_lock(&v4v_lock);
+
+    v4v_dprintk("d->v=%p\n", d->v4v);
+
+    if ( d->v4v )
+    {
+        for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+        {
+            struct hlist_node *node, *next;
+            struct v4v_ring_info *ring_info;
+
+            hlist_for_each_entry_safe(ring_info, node,
+                    next, &d->v4v->ring_hash[i],
+                    node)
+            {
+                v4v_ring_remove_info(d, ring_info);
+            }
+        }
+    }
+
+    d->v4v = NULL;
+    write_unlock(&v4v_lock);
+}
+
+int
+v4v_init(struct domain *d)
+{
+    struct v4v_domain *v4v;
+    evtchn_port_t port;
+    int i;
+    int rc;
+
+    v4v = xmalloc(struct v4v_domain);
+    if ( !v4v )
+        return -ENOMEM;
+
+    rc = evtchn_alloc_unbound_domain(d, &port, d->domain_id);
+    if ( rc )
+        return rc;
+
+    rwlock_init(&v4v->lock);
+
+    v4v->evtchn_port = port;
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+        INIT_HLIST_HEAD(&v4v->ring_hash[i]);
+
+    write_lock(&v4v_lock);
+    d->v4v = v4v;
+    write_unlock(&v4v_lock);
+
+    return 0;
+}
+
+
+/*
+ * debug
+ */
+
+static void
+dump_domain_ring(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    uint32_t rx_ptr;
+
+    printk(KERN_ERR "  ring: domid=%d port=0x%08x partner=%d npage=%d\n",
+           (int)d->domain_id, (int)ring_info->id.addr.port,
+           (int)ring_info->id.partner, (int)ring_info->npage);
+
+    if ( v4v_ringbuf_get_rx_ptr(d, ring_info, &rx_ptr) )
+    {
+        printk(KERN_ERR "   Failed to read rx_ptr\n");
+        return;
+    }
+
+    printk(KERN_ERR "   tx_ptr=%d rx_ptr=%d len=%d\n",
+           (int)ring_info->tx_ptr, (int)rx_ptr, (int)ring_info->len);
+}
+
+static void
+dump_domain(struct domain *d)
+{
+    int i;
+
+    printk(KERN_ERR " domain %d:\n", (int)d->domain_id);
+
+    read_lock(&d->v4v->lock);
+
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry(ring_info, node, &d->v4v->ring_hash[i], node)
+            dump_domain_ring(d, ring_info);
+    }
+
+    printk(KERN_ERR "  event channel: %d\n",  d->v4v->evtchn_port);
+    read_unlock(&d->v4v->lock);
+
+    printk(KERN_ERR "\n");
+    v4v_signal_domain(d);
+}
+
+static void
+dump_state(unsigned char key)
+{
+    struct domain *d;
+
+    printk(KERN_ERR "\n\nV4V:\n");
+    read_lock(&v4v_lock);
+
+    rcu_read_lock(&domlist_read_lock);
+
+    for_each_domain(d)
+        dump_domain(d);
+
+    rcu_read_unlock(&domlist_read_lock);
+
+    read_unlock(&v4v_lock);
+}
+
+struct keyhandler v4v_info_keyhandler =
+{
+    .diagnostic = 1,
+    .u.fn = dump_state,
+    .desc = "dump v4v states and interupt"
+};
+
+static int __init
+setup_dump_rings(void)
+{
+    register_keyhandler('4', &v4v_info_keyhandler);
+    return 0;
+}
+
+__initcall(setup_dump_rings);
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/v4v.h b/xen/include/public/v4v.h
new file mode 100644
index 0000000..6a9d55c
--- /dev/null
+++ b/xen/include/public/v4v.h
@@ -0,0 +1,308 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __XEN_PUBLIC_V4V_H__
+#define __XEN_PUBLIC_V4V_H__
+
+#include "xen.h"
+#include "event_channel.h"
+
+/*
+ * Structure definitions
+ */
+
+#define V4V_RING_MAGIC          0xa822f72bb0b9d8ccULL
+#define V4V_RING_DATA_MAGIC	0x45fe852220b801d4ULL
+
+#define V4V_MESSAGE_DGRAM       0x3c2c1db8
+#define V4V_MESSAGE_STREAM 	0x70f6a8e5
+
+#define V4V_DOMID_ANY           DOMID_INVALID
+#define V4V_PORT_ANY            0
+
+typedef struct v4v_iov
+{
+    uint64_t iov_base;
+    uint32_t iov_len;
+    uint32_t pad;
+} v4v_iov_t;
+
+typedef struct v4v_addr
+{
+    uint32_t port;
+    domid_t domain;
+    uint16_t pad;
+} v4v_addr_t;
+
+typedef struct v4v_ring_id
+{
+    v4v_addr_t addr;
+    domid_t partner;
+    uint16_t pad;
+} v4v_ring_id_t;
+
+typedef struct
+{
+    v4v_addr_t src;
+    v4v_addr_t dst;
+} v4v_send_addr_t;
+
+/*
+ * v4v_ring
+ * id: xen only looks at this during register/unregister
+ *     and will fill in id.addr.domain
+ * rx_ptr: rx pointer, modified by domain
+ * tx_ptr: tx pointer, modified by xen
+ *
+ */
+struct v4v_ring
+{
+    uint64_t magic;
+    v4v_ring_id_t id;
+    uint32_t len;
+    uint32_t rx_ptr;
+    uint32_t tx_ptr;
+    uint8_t reserved[32];
+    uint8_t ring[];
+};
+typedef struct v4v_ring v4v_ring_t;
+
+#define V4V_RING_DATA_F_EMPTY       (1U << 0) /* Ring is empty */
+#define V4V_RING_DATA_F_EXISTS      (1U << 1) /* Ring exists */
+#define V4V_RING_DATA_F_PENDING     (1U << 2) /* Pending interrupt exists - do not
+                                               * rely on this field - for
+                                               * profiling only */
+#define V4V_RING_DATA_F_SUFFICIENT  (1U << 3) /* Sufficient space to queue
+                                               * space_required bytes exists */
+
+typedef struct v4v_ring_data_ent
+{
+    v4v_addr_t ring;
+    uint16_t flags;
+    uint16_t pad1;
+    uint32_t pad2;
+    uint32_t space_required;
+    uint32_t max_message_size;
+} v4v_ring_data_ent_t;
+
+typedef struct v4v_ring_data
+{
+    uint64_t magic;
+    uint32_t nent;
+    uint32_t pad;
+    uint64_t reserved[4];
+    v4v_ring_data_ent_t data[];
+} v4v_ring_data_t;
+
+struct v4v_info
+{
+    uint64_t ring_magic;
+    uint64_t data_magic;
+    evtchn_port_t evtchn;
+    uint32_t pad;
+};
+typedef struct v4v_info v4v_info_t;
+
+#define V4V_SHF_SYN		(1 << 0)
+#define V4V_SHF_ACK		(1 << 1)
+#define V4V_SHF_RST		(1 << 2)
+
+#define V4V_SHF_PING		(1 << 8)
+#define V4V_SHF_PONG		(1 << 9)
+
+struct v4v_stream_header
+{
+    uint32_t flags;
+    uint32_t conid;
+};
+
+struct v4v_ring_message_header
+{
+    uint32_t len;
+    uint32_t pad0;
+    v4v_addr_t source;
+    uint32_t message_type;
+    uint32_t pad1;
+    uint8_t data[];
+};
+
+typedef struct v4v_viptables_rule
+{
+    v4v_addr_t src;
+    v4v_addr_t dst;
+    uint32_t accept;
+    uint32_t pad;
+} v4v_viptables_rule_t;
+
+typedef struct v4v_viptables_list
+{
+    uint32_t start_rule;
+    uint32_t nb_rules;
+    struct v4v_viptables_rule rules[];
+} v4v_viptables_list_t;
+
+/*
+ * HYPERCALLS
+ */
+
+/*
+ * V4VOP_register_ring
+ *
+ * Registers a ring with Xen. If a ring with the same v4v_ring_id exists,
+ * the hypercall will return -EEXIST.
+ *
+ * do_v4v_op(V4VOP_unregister_ring,
+ *           XEN_GUEST_HANDLE(v4v_ring_t), XEN_GUEST_HANDLE(xen_pfn_t),
+ *           npage, 0)
+ */
+#define V4VOP_register_ring 	1
+
+
+/*
+ * V4VOP_unregister_ring
+ *
+ * Unregister a ring.
+ *
+ * do_v4v_op(V4VOP_unregister_ring,
+ *           XEN_GUEST_HANDLE(v4v_ring),
+ *           NULL, 0, 0)
+ */
+#define V4VOP_unregister_ring 	2
+
+/*
+ * V4VOP_send
+ *
+ * Sends len bytes of buf to dst, giving src as the source address (xen will
+ * ignore src->domain and put your domain in the actually message), xen
+ * first looks for a ring with id.addr==dst and id.partner==sending_domain
+ * if that fails it looks for id.addr==dst and id.partner==DOMID_ANY.
+ * message_type is the 32 bit number used from the message
+ * most likely V4V_MESSAGE_DGRAM or V4V_MESSAGE_STREAM. If insufficient space exists
+ * it will return -EAGAIN and xen will twing the V4V_INTERRUPT when
+ * sufficient space becomes available
+ *
+ * do_v4v_op(V4VOP_send,
+ *           XEN_GUEST_HANDLE(v4v_send_addr_t) addr,
+ *           XEN_GUEST_HANDLE(void) buf,
+ *           uint32_t len,
+ *           uint32_t message_type)
+ */
+#define V4VOP_send 		3
+
+
+/*
+ * V4VOP_notify
+ *
+ * Asks xen for information about other rings in the system
+ *
+ * ent->ring is the v4v_addr_t of the ring you want information on
+ * the same matching rules are used as for V4VOP_send.
+ *
+ * ent->space_required  if this field is not null xen will check
+ * that there is space in the destination ring for this many bytes
+ * of payload. If there is it will set the V4V_RING_DATA_F_SUFFICIENT
+ * and CANCEL any pending interrupt for that ent->ring, if insufficient
+ * space is available it will schedule an interrupt and the flag will
+ * not be set.
+ *
+ * The flags are set by xen when notify replies
+ * V4V_RING_DATA_F_EMPTY	ring is empty
+ * V4V_RING_DATA_F_PENDING	interrupt is pending - don't rely on this
+ * V4V_RING_DATA_F_SUFFICIENT	sufficient space for space_required is there
+ * V4V_RING_DATA_F_EXISTS	ring exists
+ *
+ * do_v4v_op(V4VOP_notify,
+ *           XEN_GUEST_HANDLE(v4v_ring_data_ent) ent,
+ *           NULL, 0, 0)
+ */
+#define V4VOP_notify 		4
+
+/*
+ * V4VOP_sendv
+ *
+ * Identical to V4VOP_send except rather than buf and len it takes
+ * an array of v4v_iov and a length of the array.
+ *
+ * do_v4v_op(V4VOP_sendv,
+ *           XEN_GUEST_HANDLE(v4v_send_addr_t) addr,
+ *           XEN_GUEST_HANDLE(v4v_iov) iov,
+ *           uint32_t niov,
+ *           uint32_t message_type)
+ */
+#define V4VOP_sendv		5
+
+/*
+ * V4VOP_viptables_add
+ *
+ * Insert a filtering rules after a given position.
+ *
+ * do_v4v_op(V4VOP_viptables_add,
+ *           XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+ *           NULL,
+ *           uint32_t position, 0)
+ */
+#define V4VOP_viptables_add     6
+
+/*
+ * V4VOP_viptables_del
+ *
+ * Delete a filtering rules at a position or the rule
+ * that matches "rule".
+ *
+ * do_v4v_op(V4VOP_viptables_del,
+ *           XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+ *           NULL,
+ *           uint32_t position, 0)
+ */
+#define V4VOP_viptables_del     7
+
+/*
+ * V4VOP_viptables_list
+ *
+ * Delete a filtering rules at a position or the rule
+ * that matches "rule".
+ *
+ * do_v4v_op(V4VOP_viptables_list,
+ *           XEN_GUEST_HANDLE(v4v_vitpables_list_t) list,
+ *           NULL, 0, 0)
+ */
+#define V4VOP_viptables_list    8
+
+/*
+ * V4VOP_info
+ *
+ * Returns v4v info for the current domain (domain that issued the hypercall).
+ *      - V4V magic number
+ *      - event channel port (for current domain)
+ *
+ * do_v4v_op(V4VOP_info,
+ *           XEN_GUEST_HANDLE(v4v_info_t) info,
+ *           NULL, 0, 0)
+ */
+#define V4VOP_info              9
+
+#endif /* __XEN_PUBLIC_V4V_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 8def420..7a3fade 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -97,7 +97,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
 #define __HYPERVISOR_domctl               36
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
-#define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_v4v_op               39
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 53804c8..296de52 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -23,6 +23,7 @@
 #include <public/sysctl.h>
 #include <public/vcpu.h>
 #include <public/mem_event.h>
+#include <xen/v4v.h>
 
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
@@ -350,6 +351,9 @@ struct domain
     nodemask_t node_affinity;
     unsigned int last_alloc_node;
     spinlock_t node_affinity_lock;
+
+    /* v4v */
+    struct v4v_domain *v4v;
 };
 
 struct domain_setup_info
diff --git a/xen/include/xen/v4v.h b/xen/include/xen/v4v.h
new file mode 100644
index 0000000..05d20b5
--- /dev/null
+++ b/xen/include/xen/v4v.h
@@ -0,0 +1,133 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __V4V_PRIVATE_H__
+#define __V4V_PRIVATE_H__
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/smp.h>
+#include <xen/shared.h>
+#include <xen/list.h>
+#include <public/v4v.h>
+
+#define V4V_HTABLE_SIZE 32
+
+/*
+ * Handlers
+ */
+
+DEFINE_XEN_GUEST_HANDLE (v4v_iov_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_addr_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_send_addr_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_ent_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_info_t);
+
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_rule_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_list_t);
+
+/*
+ * Helper functions
+ */
+
+static inline uint16_t
+v4v_hash_fn (v4v_ring_id_t *id)
+{
+    uint16_t ret;
+    ret = (uint16_t) (id->addr.port >> 16);
+    ret ^= (uint16_t) id->addr.port;
+    ret ^= id->addr.domain;
+    ret ^= id->partner;
+
+    ret &= (V4V_HTABLE_SIZE-1);
+
+    return ret;
+}
+
+struct v4v_pending_ent
+{
+    struct hlist_node node;
+    domid_t id;
+    uint32_t len;
+};
+
+
+struct v4v_ring_info
+{
+    /* next node in the hash, protected by L2  */
+    struct hlist_node node;
+    /* this ring's id, protected by L2 */
+    v4v_ring_id_t id;
+    /* L3 */
+    spinlock_t lock;
+    /* cached length of the ring (from ring->len), protected by L3 */
+    uint32_t len;
+    uint32_t npage;
+    /* cached tx pointer location, protected by L3 */
+    uint32_t tx_ptr;
+    /* guest ring, protected by L3 */
+    XEN_GUEST_HANDLE(v4v_ring_t) ring;
+    /* mapped ring pages protected by L3*/
+    uint8_t **mfn_mapping;
+    /* list of mfns of guest ring */
+    mfn_t *mfns;
+    /* list of struct v4v_pending_ent for this ring, L3 */
+    struct hlist_head pending;
+};
+
+/*
+ * The value of the v4v element in a struct domain is
+ * protected by the global lock L1
+ */
+struct v4v_domain
+{
+    /* L2 */
+    rwlock_t lock;
+    /* event channel */
+    evtchn_port_t evtchn_port;
+    /* protected by L2 */
+    struct hlist_head ring_hash[V4V_HTABLE_SIZE];
+};
+
+typedef struct v4v_viptables_rule_node
+{
+    struct list_head list;
+    v4v_viptables_rule_t rule;
+} v4v_viptables_rule_node_t;
+
+void v4v_destroy(struct domain *d);
+int v4v_init(struct domain *d);
+long do_v4v_op (int cmd,
+                XEN_GUEST_HANDLE (void) arg1,
+                XEN_GUEST_HANDLE (void) arg2,
+                uint32_t arg3,
+                uint32_t arg4);
+
+#endif /* __V4V_PRIVATE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] xen: Add V4V implementation
  2012-09-13 18:09 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
@ 2012-09-14  9:54   ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2012-09-14  9:54 UTC (permalink / raw)
  To: Jean Guyader; +Cc: tim, xen-devel

>>> On 13.09.12 at 20:09, Jean Guyader <jean.guyader@citrix.com> wrote:
>+typedef struct v4v_iov
>+{
>+    uint64_t iov_base;
>+    uint32_t iov_len;
>+} v4v_iov_t;

Lost padding here?

>+struct v4v_ring
>+{
>+    uint64_t magic;
>+    v4v_ring_id_t id;
>+    uint32_t len;
>+    uint32_t rx_ptr;
>+    uint32_t tx_ptr;
>+    uint8_t reserved[32];
>+    uint8_t ring[0];
>+};

Zero-sized arrays are not standard C (not even C11 iirc), this is
purely a GNU extension. This appears again elsewhere in this file.

>+typedef struct v4v_ring_data_ent
>+{
>+    v4v_addr_t ring;
>+    uint16_t flags;
>+    uint16_t pad;
>+    uint32_t space_required;
>+    uint32_t max_message_size;
>+} v4v_ring_data_ent_t;

For me, this sums up to 20 bytes.

>+typedef struct v4v_ring_data
>+{
>+    uint64_t magic;
>+    uint32_t nent;
>+    uint32_t pad;
>+    uint64_t reserved[4];
>+    v4v_ring_data_ent_t data[0];
>+} v4v_ring_data_t;

Consequently, sizeof(v4v_ring_data_t) will differ for 32- and
64-bit x86 guests. I can't tell whether that would matter
though (due to the intended, but again wrongly done,
trailing variable size array).

>+ * V4VOP_register_ring

This conflicts with ...

>+ *
>+ * Registers a ring with Xen, if a ring with the same v4v_ring_id exists,
>+ * this ring takes its place, registration will not change tx_ptr
>+ * unless it is invalid
>+ *
>+ * do_v4v_op(V4VOP_unregister_ring,

... this.

>+ *           v4v_ring, XEN_GUEST_HANDLE(xen_pfn_t),

Missing indirection here? I doubt you want a structure passed
by value... (Again reoccurs further down.)

>+ * V4VOP_unregister_ring

Again conflicting with ...

>+ *
>+ * Unregister a ring.
>+ *
>+ * do_v4v_op(V4VOP_send, v4v_ring, NULL, 0, 0)

... this.

>+ * do_v4v_op(V4VOP_send,
>+ *           v4v_send_addr_t addr,
>+ *           void* buf,

XEN_GUEST_HANDLE(void) or, given this is a send, probably rather
XEN_GUEST_HANDLE(const_void). But then again this would
conflict with the real do_v4v_op() declaration.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/4] xen: Add V4V implementation
  2012-09-13 18:09 [RFC][PATCH 0/4] Add V4V to Xen Jean Guyader
@ 2012-09-13 18:09 ` Jean Guyader
  2012-09-14  9:54   ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Jean Guyader @ 2012-09-13 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: tim, Jean Guyader, JBeulich

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]


Setup of v4v domains a domain gets created and cleanup
when a domain die. Wire up the v4v hypercall.

Include v4v internal and public headers.

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/arch/x86/hvm/hvm.c             |    9 +-
 xen/arch/x86/x86_32/entry.S        |    2 +
 xen/arch/x86/x86_64/compat/entry.S |    2 +
 xen/arch/x86/x86_64/entry.S        |    2 +
 xen/common/Makefile                |    1 +
 xen/common/domain.c                |   13 +-
 xen/common/v4v.c                   | 1907 ++++++++++++++++++++++++++++++++++++
 xen/include/public/v4v.h           |  305 ++++++
 xen/include/public/xen.h           |    2 +-
 xen/include/xen/sched.h            |    4 +
 xen/include/xen/v4v.h              |  133 +++
 11 files changed, 2375 insertions(+), 5 deletions(-)
 create mode 100644 xen/common/v4v.c
 create mode 100644 xen/include/public/v4v.h
 create mode 100644 xen/include/xen/v4v.h


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-xen-Add-V4V-implementation.patch --]
[-- Type: text/x-patch; name="0004-xen-Add-V4V-implementation.patch", Size: 70117 bytes --]

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0576a24..62c636a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3134,7 +3134,8 @@ static hvm_hypercall_t *hvm_hypercall32_table[NR_hypercalls] = {
     HYPERCALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #else /* defined(__x86_64__) */
@@ -3219,7 +3220,8 @@ static hvm_hypercall_t *hvm_hypercall64_table[NR_hypercalls] = {
     HYPERCALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #define COMPAT_CALL(x)                                        \
@@ -3236,7 +3238,8 @@ static hvm_hypercall_t *hvm_hypercall32_table[NR_hypercalls] = {
     COMPAT_CALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #endif /* defined(__x86_64__) */
diff --git a/xen/arch/x86/x86_32/entry.S b/xen/arch/x86/x86_32/entry.S
index 2982679..5b1f55b 100644
--- a/xen/arch/x86/x86_32/entry.S
+++ b/xen/arch/x86/x86_32/entry.S
@@ -700,6 +700,7 @@ ENTRY(hypercall_table)
         .long do_domctl
         .long do_kexec_op
         .long do_tmem_op
+        .long do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/4)
         .long do_ni_hypercall
         .endr
@@ -748,6 +749,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec_op          */
         .byte 1 /* do_tmem_op           */
+        .byte 5 /* do_v4v_op		*/
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index f49ff2d..6b838d3 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -414,6 +414,7 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -462,6 +463,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 5 /* do_v4v_op		    */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 997bc94..e6a7fdd 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -707,6 +707,7 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -755,6 +756,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 5 /* do_v4v_op		*/
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 9eba8bc..fe3c72c 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -45,6 +45,7 @@ obj-y += tmem_xen.o
 obj-y += radix-tree.o
 obj-y += rbtree.o
 obj-y += lzo.o
+obj-y += v4v.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo,$(n).init.o)
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index a1aa05e..94283e3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -195,7 +195,8 @@ struct domain *domain_create(
 {
     struct domain *d, **pd;
     enum { INIT_xsm = 1u<<0, INIT_watchdog = 1u<<1, INIT_rangeset = 1u<<2,
-           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5 };
+           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5,
+           INIT_v4v = 1u<<6 };
     int err, init_status = 0;
     int poolid = CPUPOOLID_NONE;
 
@@ -307,6 +308,13 @@ struct domain *domain_create(
         spin_unlock(&domlist_update_lock);
     }
 
+    if ( !is_idle_domain(d) )
+    {
+        if ( v4v_init(d) != 0 )
+            goto fail;
+        init_status |= INIT_v4v;
+    }
+
     return d;
 
  fail:
@@ -315,6 +323,8 @@ struct domain *domain_create(
     xfree(d->mem_event);
     if ( init_status & INIT_arch )
         arch_domain_destroy(d);
+    if ( init_status & INIT_v4v )
+	v4v_destroy(d);
     if ( init_status & INIT_gnttab )
         grant_table_destroy(d);
     if ( init_status & INIT_evtchn )
@@ -468,6 +478,7 @@ int domain_kill(struct domain *d)
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
         spin_barrier(&d->domain_lock);
+        v4v_destroy(d);
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
diff --git a/xen/common/v4v.c b/xen/common/v4v.c
new file mode 100644
index 0000000..a6a2648
--- /dev/null
+++ b/xen/common/v4v.c
@@ -0,0 +1,1907 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/config.h>
+#include <xen/mm.h>
+#include <xen/compat.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/v4v.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <xen/keyhandler.h>
+#include <asm/types.h>
+
+#ifdef V4V_DEBUG
+#define v4v_dprintk(format, args...)            \
+    do {                                        \
+        printk("%s:%d " format,                 \
+               __FILE__, __LINE__, ## args );   \
+    } while ( 1 == 0 )
+#else
+#define v4v_dprintk(format, ... ) (void)0
+#endif
+
+/*
+ * Messages on the ring are padded to 128 bits
+ * Len here refers to the exact length of the data not including the
+ * 128 bit header. The message uses
+ * ((len +0xf) & ~0xf) + sizeof(v4v_ring_message_header) bytes
+ */
+#define V4V_ROUNDUP(a) (((a) +0xf ) & ~0xf)
+
+DEFINE_XEN_GUEST_HANDLE(uint8_t);
+static struct v4v_ring_info *v4v_ring_find_info(struct domain *d,
+                                                v4v_ring_id_t *id);
+
+static struct v4v_ring_info *v4v_ring_find_info_by_addr(struct domain *d,
+                                                        struct v4v_addr *a,
+                                                        domid_t p);
+
+typedef struct internal_v4v_iov
+{
+    XEN_GUEST_HANDLE(v4v_iov_t) guest_iov;
+    v4v_iov_t                   *xen_iov;
+} internal_v4v_iov_t;
+
+struct list_head viprules = LIST_HEAD_INIT(viprules);
+
+/*
+ * locks
+ */
+
+/*
+ * locking is organized as follows:
+ *
+ * the global lock v4v_lock: L1 protects the v4v elements
+ * of all struct domain *d in the system, it does not
+ * protect any of the elements of d->v4v, just their
+ * addresses. By extension since the destruction of
+ * a domain with a non-NULL d->v4v will need to free
+ * the d->v4v pointer, holding this lock gauruntees
+ * that no domains pointers in which v4v is interested
+ * become invalid whilst this lock is held.
+ */
+
+static DEFINE_RWLOCK(v4v_lock); /* L1 */
+
+/*
+ * the lock d->v4v->lock: L2:  Read on protects the hash table and
+ * the elements in the hash_table d->v4v->ring_hash, and
+ * the node and id fields in struct v4v_ring_info in the
+ * hash table. Write on L2 protects all of the elements of
+ * struct v4v_ring_info. To take L2 you must already have R(L1)
+ * W(L1) implies W(L2) and L3
+ *
+ * the lock v4v_ring_info *ringinfo; ringinfo->lock: L3:
+ * protects len,tx_ptr the guest ring, the
+ * guest ring_data and the pending list. To take L3 you must
+ * already have R(L2). W(L2) implies L3
+ */
+
+/*
+ * lock to protect the filtering rules list: viprules
+ *
+ * The write lock is held for viptables_del and viptables_add
+ * The read lock is held for viptable_list
+ */
+static DEFINE_RWLOCK(viprules_lock);
+
+
+/*
+ * Debugs
+ */
+
+#ifdef V4V_DEBUG
+static void __attribute__((unused))
+v4v_hexdump(void *_p, int len)
+{
+    uint8_t *buf = (uint8_t *)_p;
+    int i, j;
+
+    for ( i = 0; i < len; i += 16 )
+    {
+        printk(KERN_ERR "%p:", &buf[i]);
+        for ( j = 0; j < 16; ++j )
+        {
+            int k = i + j;
+            if ( k < len )
+                printk(" %02x", buf[k]);
+            else
+                printk("   ");
+        }
+        printk(" ");
+
+        for ( j = 0; j < 16; ++j )
+        {
+            int k = i + j;
+            if ( k < len )
+                printk("%c", ((buf[k] > 32) && (buf[k] < 127)) ? buf[k] : '.');
+            else
+                printk(" ");
+        }
+        printk("\n");
+    }
+}
+#endif
+
+
+/*
+ * Event channel
+ */
+
+static void
+v4v_signal_domain(struct domain *d)
+{
+    v4v_dprintk("send guest VIRQ_V4V domid:%d\n", d->domain_id);
+
+    evtchn_send(d, d->v4v->evtchn_port);
+}
+
+static void
+v4v_signal_domid(domid_t id)
+{
+    struct domain *d = get_domain_by_id(id);
+    if ( !d )
+        return;
+    v4v_signal_domain(d);
+    put_domain(d);
+}
+
+
+/*
+ * ring buffer
+ */
+
+static void
+v4v_ring_unmap(struct v4v_ring_info *ring_info)
+{
+    int i;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    for ( i = 0; i < ring_info->npage; ++i )
+    {
+        if ( !ring_info->mfn_mapping[i] )
+            continue;
+        v4v_dprintk("unmapping page %p from %p\n",
+                    (void*)mfn_x(ring_info->mfns[i]),
+                    ring_info->mfn_mapping[i]);
+
+        unmap_domain_page(ring_info->mfn_mapping[i]);
+        ring_info->mfn_mapping[i] = NULL;
+    }
+}
+
+static uint8_t *
+v4v_ring_map_page(struct v4v_ring_info *ring_info, int i)
+{
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( i >= ring_info->npage )
+        return NULL;
+    if ( ring_info->mfn_mapping[i] )
+        return ring_info->mfn_mapping[i];
+    ring_info->mfn_mapping[i] = map_domain_page(mfn_x(ring_info->mfns[i]));
+
+    v4v_dprintk("mapping page %p to %p\n",
+                (void *)mfn_x(ring_info->mfns[i]),
+                ring_info->mfn_mapping[i]);
+    return ring_info->mfn_mapping[i];
+}
+
+static int
+v4v_memcpy_from_guest_ring(void *_dst, struct v4v_ring_info *ring_info,
+                           uint32_t offset, uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *src;
+    uint8_t *dst = _dst;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= PAGE_SIZE - 1;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        src = v4v_ring_map_page(ring_info, page);
+
+        if ( !src )
+            return -EFAULT;
+
+        v4v_dprintk("memcpy(%p,%p+%d,%d)\n",
+                    dst, src, offset,
+                    (int)(PAGE_SIZE - offset));
+        memcpy(dst, src + offset, PAGE_SIZE - offset);
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        dst += PAGE_SIZE - offset;
+        offset = 0;
+    }
+
+    src = v4v_ring_map_page(ring_info, page);
+    if ( !src )
+        return -EFAULT;
+
+    v4v_dprintk("memcpy(%p,%p+%d,%d)\n", dst, src, offset, len);
+    memcpy(dst, src + offset, len);
+
+    return 0;
+}
+
+static int
+v4v_update_tx_ptr(struct v4v_ring_info *ring_info, uint32_t tx_ptr)
+{
+    uint8_t *dst = v4v_ring_map_page(ring_info, 0);
+    uint32_t *p = (uint32_t *)(dst + offsetof(v4v_ring_t, tx_ptr));
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( !dst )
+        return -EFAULT;
+    write_atomic(p, tx_ptr);
+    mb();
+    return 0;
+}
+
+static int
+v4v_copy_from_guest_maybe(void *dst, void *src,
+                          XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                          uint32_t len)
+{
+    int rc = 0;
+
+    if ( src )
+        memcpy(dst, src, len);
+    else
+        rc = copy_from_guest(dst, src_hnd, len);
+    return rc;
+}
+
+static int
+v4v_memcpy_to_guest_ring(struct v4v_ring_info *ring_info,
+                         uint32_t offset,
+                         void *src,
+                         XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                         uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *dst;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= PAGE_SIZE - 1;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        dst = v4v_ring_map_page(ring_info, page);
+        if ( !dst )
+            return -EFAULT;
+
+        if ( v4v_copy_from_guest_maybe(dst + offset, src, src_hnd,
+                                       PAGE_SIZE - offset) )
+            return -EFAULT;
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        if ( src )
+            src += (PAGE_SIZE - offset);
+        else
+            guest_handle_add_offset(src_hnd, PAGE_SIZE - offset);
+        offset = 0;
+    }
+
+    dst = v4v_ring_map_page(ring_info, page);
+    if ( !dst )
+        return -EFAULT;
+
+    if ( v4v_copy_from_guest_maybe(dst + offset, src, src_hnd, len) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static int
+v4v_ringbuf_get_rx_ptr(struct domain *d, struct v4v_ring_info *ring_info,
+                        uint32_t * rx_ptr)
+{
+    v4v_ring_t *ringp;
+
+    if ( ring_info->npage == 0 )
+        return -1;
+
+    ringp = map_domain_page(mfn_x(ring_info->mfns[0]));
+
+    v4v_dprintk("v4v_ringbuf_payload_space: mapped %p to %p\n",
+                (void *)mfn_x(ring_info->mfns[0]), ringp);
+    if ( !ringp )
+        return -1;
+
+    write_atomic(rx_ptr, ringp->rx_ptr);
+    mb();
+
+    unmap_domain_page(mfn_x(ring_info->mfns[0]));
+    return 0;
+}
+
+uint32_t
+v4v_ringbuf_payload_space(struct domain * d, struct v4v_ring_info * ring_info)
+{
+    v4v_ring_t ring;
+    int32_t ret;
+
+    ring.tx_ptr = ring_info->tx_ptr;
+    ring.len = ring_info->len;
+
+    if ( v4v_ringbuf_get_rx_ptr(d, ring_info, &ring.rx_ptr) )
+        return 0;
+
+    v4v_dprintk("v4v_ringbuf_payload_space:tx_ptr=%d rx_ptr=%d\n",
+                (int)ring.tx_ptr, (int)ring.rx_ptr);
+    if ( ring.rx_ptr == ring.tx_ptr )
+        return ring.len - sizeof (struct v4v_ring_message_header);
+
+    ret = ring.rx_ptr - ring.tx_ptr;
+    if ( ret < 0 )
+        ret += ring.len;
+
+    ret -= sizeof (struct v4v_ring_message_header);
+    ret -= V4V_ROUNDUP(1);
+
+    return (ret < 0) ? 0 : ret;
+}
+
+static unsigned long
+v4v_iov_copy(v4v_iov_t *iov, internal_v4v_iov_t *iovs)
+{
+    if (iovs->xen_iov == NULL)
+    {
+        return copy_from_guest(iov, iovs->guest_iov, 1);
+    }
+    else
+    {
+        *iov = *(iovs->xen_iov);
+        return 0;
+    }
+}
+
+static void
+v4v_iov_add_offset(internal_v4v_iov_t *iovs, int offset)
+{
+    if (iovs->xen_iov == NULL)
+        guest_handle_add_offset(iovs->guest_iov, offset);
+    else
+        iovs->xen_iov += offset;
+}
+
+static long
+v4v_iov_count(internal_v4v_iov_t iovs, int niov)
+{
+    v4v_iov_t iov;
+    size_t ret = 0;
+
+    while ( niov-- )
+    {
+        if ( v4v_iov_copy(&iov, &iovs) )
+            return -EFAULT;
+
+        /* message bigger than 2G can't be sent */
+        if (ret > 2L * 1024 * 1024 * 1024)
+            return -EMSGSIZE;
+
+        ret += iov.iov_len;
+        v4v_iov_add_offset(&iovs, 1);
+    }
+
+    return ret;
+}
+
+static long
+v4v_ringbuf_insertv(struct domain *d,
+                    struct v4v_ring_info *ring_info,
+                    v4v_ring_id_t *src_id, uint32_t proto,
+                    internal_v4v_iov_t iovs, uint32_t niov,
+                    size_t len)
+{
+    v4v_ring_t ring;
+    struct v4v_ring_message_header mh = { 0 };
+    int32_t sp;
+    long happy_ret;
+    int32_t ret = 0;
+    XEN_GUEST_HANDLE(uint8_t) empty_hnd = { 0 };
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    happy_ret = len;
+
+    if ( (V4V_ROUNDUP(len) + sizeof (struct v4v_ring_message_header) ) >=
+            ring_info->len)
+        return -EMSGSIZE;
+
+    do
+    {
+        if ( (ret = v4v_memcpy_from_guest_ring(&ring, ring_info, 0,
+                                               sizeof (ring))) )
+            break;
+
+        ring.tx_ptr = ring_info->tx_ptr;
+        ring.len = ring_info->len;
+
+        v4v_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d ring_info->tx_ptr=%d\n",
+                    ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
+
+        if ( ring.rx_ptr == ring.tx_ptr )
+            sp = ring_info->len;
+        else
+        {
+            sp = ring.rx_ptr - ring.tx_ptr;
+            if ( sp < 0 )
+                sp += ring.len;
+        }
+
+        if ( (V4V_ROUNDUP(len) + sizeof (struct v4v_ring_message_header)) >= sp )
+        {
+            v4v_dprintk("EAGAIN\n");
+            ret = -EAGAIN;
+            break;
+        }
+
+        mh.len = len + sizeof (struct v4v_ring_message_header);
+        mh.source = src_id->addr;
+        mh.message_type = proto;
+
+        if ( (ret = v4v_memcpy_to_guest_ring(ring_info,
+                                             ring.tx_ptr + sizeof (v4v_ring_t),
+                                             &mh, empty_hnd,
+                                             sizeof (mh))) )
+            break;
+
+        ring.tx_ptr += sizeof (mh);
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        while ( niov-- )
+        {
+            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
+            v4v_iov_t iov;
+
+            if ( v4v_iov_copy(&iov, &iovs) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            buf_hnd.p = (uint8_t *)iov.iov_base; //FIXME
+            len = iov.iov_len;
+
+            if ( unlikely(!guest_handle_okay(buf_hnd, len)) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            sp = ring.len - ring.tx_ptr;
+
+            if ( len > sp )
+            {
+                ret = v4v_memcpy_to_guest_ring(ring_info,
+                        ring.tx_ptr + sizeof (v4v_ring_t),
+                        NULL, buf_hnd, sp);
+                if ( ret )
+                    break;
+
+                ring.tx_ptr = 0;
+                len -= sp;
+                guest_handle_add_offset(buf_hnd, sp);
+            }
+
+            ret = v4v_memcpy_to_guest_ring(ring_info,
+                    ring.tx_ptr + sizeof (v4v_ring_t),
+                    NULL, buf_hnd, len);
+            if ( ret )
+                break;
+
+            ring.tx_ptr += len;
+
+            if ( ring.tx_ptr == ring_info->len )
+                ring.tx_ptr = 0;
+
+            v4v_iov_add_offset(&iovs, 1);
+        }
+        if ( ret )
+            break;
+
+        ring.tx_ptr = V4V_ROUNDUP(ring.tx_ptr);
+
+        if ( ring.tx_ptr >= ring_info->len )
+            ring.tx_ptr -= ring_info->len;
+
+        mb();
+        ring_info->tx_ptr = ring.tx_ptr;
+        if ( (ret = v4v_update_tx_ptr(ring_info, ring.tx_ptr)) )
+            break;
+    }
+    while ( 0 );
+
+    v4v_ring_unmap(ring_info);
+
+    return ret ? ret : happy_ret;
+}
+
+
+
+/* pending */
+static void
+v4v_pending_remove_ent(struct v4v_pending_ent *ent)
+{
+    hlist_del(&ent->node);
+    xfree(ent);
+}
+
+static void
+v4v_pending_remove_all(struct v4v_ring_info *info)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    ASSERT(spin_is_locked(&info->lock));
+    hlist_for_each_entry_safe(pending_ent, node, next, &info->pending,
+            node) v4v_pending_remove_ent(pending_ent);
+}
+
+static void
+v4v_pending_notify(struct domain *caller_d, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    ASSERT(rw_is_locked(&v4v_lock));
+
+    hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node)
+    {
+        hlist_del(&pending_ent->node);
+        v4v_signal_domid(pending_ent->id);
+        xfree(pending_ent);
+    }
+
+}
+
+static void
+v4v_pending_find(struct domain *d, struct v4v_ring_info *ring_info,
+                 uint32_t payload_space, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( payload_space >= ent->len )
+        {
+            hlist_del(&ent->node);
+            hlist_add_head(&ent->node, to_notify);
+        }
+    }
+    spin_unlock(&ring_info->lock);
+}
+
+/*caller must have L3 */
+static int
+v4v_pending_queue(struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct v4v_pending_ent *ent = xmalloc(struct v4v_pending_ent);
+
+    if ( !ent )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    ent->len = len;
+    ent->id = src_id;
+
+    hlist_add_head(&ent->node, &ring_info->pending);
+
+    return 0;
+}
+
+/* L3 */
+static int
+v4v_pending_requeue(struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct hlist_node *node;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry(ent, node, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id )
+        {
+            if ( ent->len < len )
+                ent->len = len;
+            return 0;
+        }
+    }
+
+    return v4v_pending_queue(ring_info, src_id, len);
+}
+
+
+/* L3 */
+static void
+v4v_pending_cancel(struct v4v_ring_info *ring_info, domid_t src_id)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id)
+        {
+            hlist_del(&ent->node);
+            xfree(ent);
+        }
+    }
+}
+
+/*
+ * ring data
+ */
+
+/*Caller should hold R(L1)*/
+static int
+v4v_fill_ring_data(struct domain *src_d,
+                   XEN_GUEST_HANDLE(v4v_ring_data_ent_t) data_ent_hnd)
+{
+    v4v_ring_data_ent_t ent;
+    struct domain *dst_d;
+    struct v4v_ring_info *ring_info;
+
+    if ( copy_from_guest(&ent, data_ent_hnd, 1) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+
+    v4v_dprintk("v4v_fill_ring_data: ent.ring.domain=%d,ent.ring.port=%u\n",
+                (int)ent.ring.domain, (int)ent.ring.port);
+
+    ent.flags = 0;
+
+    dst_d = get_domain_by_id(ent.ring.domain);
+
+    if ( dst_d && dst_d->v4v )
+    {
+        read_lock(&dst_d->v4v->lock);
+        ring_info = v4v_ring_find_info_by_addr(dst_d, &ent.ring,
+                                               src_d->domain_id);
+
+        if ( ring_info )
+        {
+            uint32_t space_avail;
+
+            ent.flags |= V4V_RING_DATA_F_EXISTS;
+            ent.max_message_size =
+                ring_info->len - sizeof (struct v4v_ring_message_header) -
+                V4V_ROUNDUP(1);
+            spin_lock(&ring_info->lock);
+
+            space_avail = v4v_ringbuf_payload_space(dst_d, ring_info);
+
+            if ( space_avail >= ent.space_required )
+            {
+                v4v_pending_cancel(ring_info, src_d->domain_id);
+                ent.flags |= V4V_RING_DATA_F_SUFFICIENT;
+            }
+            else
+            {
+                v4v_pending_requeue(ring_info, src_d->domain_id,
+                        ent.space_required);
+                ent.flags |= V4V_RING_DATA_F_PENDING;
+            }
+
+            spin_unlock(&ring_info->lock);
+
+            if ( space_avail == ent.max_message_size )
+                ent.flags |= V4V_RING_DATA_F_EMPTY;
+
+        }
+        read_unlock(&dst_d->v4v->lock);
+    }
+
+    if ( dst_d )
+        put_domain(dst_d);
+
+    if ( copy_field_to_guest(data_ent_hnd, &ent, flags) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+    return 0;
+}
+
+/*Called should hold no more than R(L1) */
+static int
+v4v_fill_ring_datas(struct domain *d, int nent,
+                     XEN_GUEST_HANDLE(v4v_ring_data_ent_t) data_ent_hnd)
+{
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+    while ( !ret && nent-- )
+    {
+        ret = v4v_fill_ring_data(d, data_ent_hnd);
+        guest_handle_add_offset(data_ent_hnd, 1);
+    }
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+/*
+ * ring
+ */
+static int
+v4v_find_ring_mfns(struct domain *d, struct v4v_ring_info *ring_info,
+                   uint32_t npage, XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd)
+{
+    int i,j;
+    mfn_t *mfns;
+    uint8_t **mfn_mapping;
+    unsigned long mfn;
+    struct page_info *page;
+    int ret = 0;
+
+    if ( (npage << PAGE_SHIFT) < ring_info->len )
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    mfns = xmalloc_array(mfn_t, npage);
+    if ( !mfns )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    mfn_mapping = xmalloc_array(uint8_t *, npage);
+    if ( !mfn_mapping )
+    {
+        xfree(mfns);
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < npage; ++i )
+    {
+        unsigned long pfn;
+        p2m_type_t p2mt;
+
+        if ( copy_from_guest_offset(&pfn, pfn_hnd, i, 1) )
+        {
+            ret = -EFAULT;
+            v4v_dprintk("EFAULT\n");
+            break;
+        }
+
+        mfn = mfn_x(get_gfn(d, pfn, &p2mt));
+        if ( !mfn_valid(mfn) )
+        {
+            printk(KERN_ERR "v4v domain %d passed invalid mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        page = mfn_to_page(mfn);
+        if ( !get_page_and_type(page, d, PGT_writable_page) )
+        {
+            printk(KERN_ERR "v4v domain %d passed wrong type mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        mfns[i] = _mfn(mfn);
+        v4v_dprintk("v4v_find_ring_mfns: %d: %lx -> %lx\n",
+                    i, (unsigned long)pfn, (unsigned long)mfn_x(mfns[i]));
+        mfn_mapping[i] = NULL;
+        put_gfn(d, pfn);
+    }
+
+    if ( !ret )
+    {
+        ring_info->npage = npage;
+        ring_info->mfns = mfns;
+        ring_info->mfn_mapping = mfn_mapping;
+    }
+    else
+    {
+        j = i;
+        for ( i = 0; i < j; ++i )
+            if ( mfn_x(mfns[i]) != 0 )
+                put_page_and_type(mfn_to_page(mfn_x(mfns[i])));
+        xfree(mfn_mapping);
+        xfree(mfns);
+    }
+    return ret;
+}
+
+
+static struct v4v_ring_info *
+v4v_ring_find_info(struct domain *d, v4v_ring_id_t *id)
+{
+    uint16_t hash;
+    struct hlist_node *node;
+    struct v4v_ring_info *ring_info;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    hash = v4v_hash_fn(id);
+
+    v4v_dprintk("ring_find_info: d->v4v=%p, d->v4v->ring_hash[%d]=%p id=%p\n",
+                d->v4v, (int)hash, d->v4v->ring_hash[hash].first, id);
+    v4v_dprintk("ring_find_info: id.addr.port=%d id.addr.domain=%d id.addr.partner=%d\n",
+                id->addr.port, id->addr.domain, id->partner);
+
+    hlist_for_each_entry(ring_info, node, &d->v4v->ring_hash[hash], node)
+    {
+        v4v_ring_id_t *cmpid = &ring_info->id;
+
+        if ( cmpid->addr.port == id->addr.port &&
+             cmpid->addr.domain == id->addr.domain &&
+             cmpid->partner == id->partner)
+        {
+            v4v_dprintk("ring_find_info: ring_info=%p\n", ring_info);
+            return ring_info;
+        }
+    }
+    v4v_dprintk("ring_find_info: no ring_info found\n");
+    return NULL;
+}
+
+static struct v4v_ring_info *
+v4v_ring_find_info_by_addr(struct domain *d, struct v4v_addr *a, domid_t p)
+{
+    v4v_ring_id_t id;
+    struct v4v_ring_info *ret;
+
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    if ( !a )
+        return NULL;
+
+    id.addr.port = a->port;
+    id.addr.domain = d->domain_id;
+    id.partner = p;
+
+    ret = v4v_ring_find_info(d, &id);
+    if ( ret )
+        return ret;
+
+    id.partner = V4V_DOMID_ANY;
+
+    return v4v_ring_find_info(d, &id);
+}
+
+static void v4v_ring_remove_mfns(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    int i;
+
+    ASSERT(rw_is_write_locked(&d->v4v->lock));
+
+    if ( ring_info->mfns )
+    {
+        for ( i = 0; i < ring_info->npage; ++i )
+            if ( mfn_x(ring_info->mfns[i]) != 0 )
+                put_page_and_type(mfn_to_page(mfn_x(ring_info->mfns[i])));
+        xfree(ring_info->mfns);
+    }
+    if ( ring_info->mfn_mapping )
+        xfree(ring_info->mfn_mapping);
+    ring_info->mfns = NULL;
+
+}
+
+static void
+v4v_ring_remove_info(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    ASSERT(rw_is_write_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+
+    v4v_pending_remove_all(ring_info);
+    hlist_del(&ring_info->node);
+    v4v_ring_remove_mfns(d, ring_info);
+
+    spin_unlock(&ring_info->lock);
+
+    xfree(ring_info);
+}
+
+/* Call from guest to unpublish a ring */
+static long
+v4v_ring_remove(struct domain *d, XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest(&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                    ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+
+        write_lock(&d->v4v->lock);
+        ring_info = v4v_ring_find_info(d, &ring.id);
+
+        if ( ring_info )
+            v4v_ring_remove_info(d, ring_info);
+
+        write_unlock(&d->v4v->lock);
+
+        if ( !ring_info )
+        {
+            v4v_dprintk("ENOENT\n");
+            ret = -ENOENT;
+            break;
+        }
+
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+/* call from guest to publish a ring */
+static long
+v4v_ring_add(struct domain *d, XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd,
+             uint32_t npage, XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int need_to_insert = 0;
+    int ret = 0;
+
+    if ( (long)ring_hnd.p & (PAGE_SIZE - 1) )
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock(&v4v_lock);
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk(" !d->v4v, EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest(&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk(" copy_from_guest failed, EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( (ring.len <
+                    (sizeof (struct v4v_ring_message_header) + V4V_ROUNDUP(1) +
+                     V4V_ROUNDUP(1))) || (V4V_ROUNDUP(ring.len) != ring.len) )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+        if ( copy_field_to_guest(ring_hnd, &ring, id) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        /*
+         * no need for a lock yet, because only we know about this
+         * set the tx pointer if it looks bogus (we don't reset it
+         * because this might be a re-register after S4)
+         */
+        if ( (ring.tx_ptr >= ring.len)
+                || (V4V_ROUNDUP(ring.tx_ptr) != ring.tx_ptr) )
+        {
+            ring.tx_ptr = ring.rx_ptr;
+        }
+        copy_field_to_guest(ring_hnd, &ring, tx_ptr);
+
+        read_lock(&d->v4v->lock);
+        ring_info = v4v_ring_find_info(d, &ring.id);
+
+        if ( !ring_info )
+        {
+            read_unlock(&d->v4v->lock);
+            ring_info = xmalloc(struct v4v_ring_info);
+            if ( !ring_info )
+            {
+                v4v_dprintk("ENOMEM\n");
+                ret = -ENOMEM;
+                break;
+            }
+            need_to_insert++;
+            spin_lock_init(&ring_info->lock);
+            INIT_HLIST_HEAD(&ring_info->pending);
+            ring_info->mfns = NULL;
+
+        }
+        else
+        {
+            /*
+             * Ring info already existed. If mfn list was already
+             * populated remove the MFN's from list and then add
+             * the new list.
+             */
+            printk(KERN_INFO "v4v: dom%d ring already registered\n",
+                    current->domain->domain_id);
+            ret = -EEXIST;
+            break;
+        }
+
+        spin_lock(&ring_info->lock);
+        ring_info->id = ring.id;
+        ring_info->len = ring.len;
+        ring_info->tx_ptr = ring.tx_ptr;
+        ring_info->ring = ring_hnd;
+        if ( ring_info->mfns )
+            xfree(ring_info->mfns);
+        ret = v4v_find_ring_mfns(d, ring_info, npage, pfn_hnd);
+        spin_unlock(&ring_info->lock);
+        if ( ret )
+            break;
+
+        if ( !need_to_insert )
+        {
+            read_unlock(&d->v4v->lock);
+        }
+        else
+        {
+            uint16_t hash = v4v_hash_fn(&ring.id);
+            write_lock(&d->v4v->lock);
+            hlist_add_head(&ring_info->node, &d->v4v->ring_hash[hash]);
+            write_unlock(&d->v4v->lock);
+        }
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+
+/*
+ * io
+ */
+
+static void
+v4v_notify_ring(struct domain *d, struct v4v_ring_info *ring_info,
+                struct hlist_head *to_notify)
+{
+    uint32_t space;
+
+    ASSERT(rw_is_locked(&v4v_lock));
+    ASSERT(rw_is_locked(&d->v4v->lock));
+
+    spin_lock(&ring_info->lock);
+    space = v4v_ringbuf_payload_space(d, ring_info);
+    spin_unlock(&ring_info->lock);
+
+    v4v_pending_find(d, ring_info, space, to_notify);
+}
+
+/*notify hypercall*/
+static long
+v4v_notify(struct domain *d,
+           XEN_GUEST_HANDLE(v4v_ring_data_t) ring_data_hnd)
+{
+    v4v_ring_data_t ring_data;
+    HLIST_HEAD(to_notify);
+    int i;
+    int ret = 0;
+
+    read_lock(&v4v_lock);
+
+    if ( !d->v4v )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!d->v4v, ENODEV\n");
+        return -ENODEV;
+    }
+
+    read_lock(&d->v4v->lock);
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node, *next;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry_safe(ring_info, node,
+                next, &d->v4v->ring_hash[i],
+                node)
+        {
+            v4v_notify_ring(d, ring_info, &to_notify);
+        }
+    }
+    read_unlock(&d->v4v->lock);
+
+    if ( !hlist_empty(&to_notify) )
+        v4v_pending_notify(d, &to_notify);
+
+    do
+    {
+        if ( !guest_handle_is_null(ring_data_hnd) )
+        {
+            /* Quick sanity check on ring_data_hnd */
+            if ( copy_field_from_guest(&ring_data, ring_data_hnd, magic) )
+            {
+                v4v_dprintk("copy_field_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            if ( ring_data.magic != V4V_RING_DATA_MAGIC )
+            {
+                v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring_data.magic, V4V_RING_MAGIC);
+                ret = -EINVAL;
+                break;
+            }
+
+            if ( copy_from_guest(&ring_data, ring_data_hnd, 1) )
+            {
+                v4v_dprintk("copy_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            {
+                XEN_GUEST_HANDLE(v4v_ring_data_ent_t) ring_data_ent_hnd;
+                ring_data_ent_hnd =
+                    guest_handle_for_field(ring_data_hnd, v4v_ring_data_ent_t, data[0]);
+                ret = v4v_fill_ring_datas(d, ring_data.nent, ring_data_ent_hnd);
+            }
+        }
+    }
+    while ( 0 );
+
+    read_unlock(&v4v_lock);
+
+    return ret;
+}
+
+#ifdef V4V_DEBUG
+void
+v4v_viptables_print_rule(struct v4v_viptables_rule_node *node)
+{
+    v4v_viptables_rule_t *rule;
+
+    if ( node == NULL )
+    {
+        printk("(null)\n");
+        return;
+    }
+
+    rule = &node->rule;
+
+    if ( rule->accept == 1 )
+        printk("ACCEPT");
+    else
+        printk("REJECT");
+
+    printk(" ");
+
+    if ( rule->src.domain == V4V_DOMID_ANY )
+        printk("*");
+    else
+        printk("%i", rule->src.domain);
+
+    printk(":");
+
+    if ( rule->src.port == -1 )
+        printk("*");
+    else
+        printk("%i", rule->src.port);
+
+    printk(" -> ");
+
+    if ( rule->dst.domain == V4V_DOMID_ANY )
+        printk("*");
+    else
+        printk("%i", rule->dst.domain);
+
+    printk(":");
+
+    if ( rule->dst.port == -1 )
+        printk("*");
+    else
+        printk("%i", rule->dst.port);
+
+    printk("\n");
+}
+#endif /* V4V_DEBUG */
+
+int
+v4v_viptables_add(struct domain *src_d,
+                  XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+                  int32_t position)
+{
+    struct v4v_viptables_rule_node* new = NULL;
+    struct list_head* tmp;
+
+    ASSERT(rw_is_write_locked(&viprules_lock));
+
+    /* First rule is n.1 */
+    position--;
+
+    new = xmalloc(struct v4v_viptables_rule_node);
+    if ( new == NULL )
+        return -ENOMEM;
+
+    if ( copy_from_guest(&new->rule, rule, 1) )
+    {
+        xfree(new);
+        return -EFAULT;
+    }
+
+#ifdef V4V_DEBUG
+    printk(KERN_ERR "VIPTables: ");
+    v4v_viptables_print_rule(new);
+#endif /* V4V_DEBUG */
+
+    tmp = &viprules;
+    while ( position != 0 && tmp->next != &viprules )
+    {
+        tmp = tmp->next;
+        position--;
+    }
+    list_add(&new->list, tmp);
+
+    return 0;
+}
+
+int
+v4v_viptables_del(struct domain *src_d,
+                  XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd,
+                  int32_t position)
+{
+    struct list_head *tmp = NULL;
+    struct list_head *next = NULL;
+    struct v4v_viptables_rule_node *node;
+
+    ASSERT(rw_is_write_locked(&viprules_lock));
+
+    v4v_dprintk("v4v_viptables_del position:%d\n", position);
+
+    if ( position != -1 )
+    {
+        /* We want to delete the rule number <position> */
+        tmp = &viprules;
+        while ( position != 0 && tmp->next != &viprules )
+        {
+            tmp = tmp->next;
+            position--;
+        }
+    }
+    else if ( !guest_handle_is_null(rule_hnd) )
+    {
+        struct v4v_viptables_rule r;
+
+        if ( copy_from_guest(&r, rule_hnd, 1) )
+            return -EFAULT;
+
+        list_for_each(tmp, &viprules)
+        {
+            node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+
+            if ( (node->rule.src.domain == r.src.domain) &&
+                 (node->rule.src.port   == r.src.port)   &&
+                 (node->rule.dst.domain == r.dst.domain) &&
+                 (node->rule.dst.port   == r.dst.port))
+            {
+                position = 0;
+                break;
+            }
+        }
+    }
+    else
+    {
+        /* We want to flush the rules! */
+        printk(KERN_ERR "VIPTables: flushing rules\n");
+        list_for_each_safe(tmp, next, &viprules)
+        {
+            node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+            list_del(tmp);
+            xfree(node);
+        }
+    }
+
+    if ( position == 0 && tmp != &viprules )
+    {
+        node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+#ifdef V4V_DEBUG
+        printk(KERN_ERR "VIPTables: deleting rule: ");
+        v4v_viptables_print_rule(node);
+#endif /* V4V_DEBUG */
+        list_del(tmp);
+        xfree(node);
+    }
+
+    return 0;
+}
+
+static size_t
+v4v_viptables_list(struct domain *src_d,
+                   XEN_GUEST_HANDLE(v4v_viptables_list_t) list_hnd)
+{
+    struct list_head *ptr;
+    struct v4v_viptables_rule_node *node;
+    struct v4v_viptables_list rules_list;
+    uint32_t nbrules;
+    XEN_GUEST_HANDLE(v4v_viptables_rule_t) guest_rules;
+
+    ASSERT(rw_is_locked(&viprules_lock));
+
+    memset(&rules_list, 0, sizeof (rules_list));
+    if ( copy_from_guest(&rules_list, list_hnd, 1) )
+        return -EFAULT;
+
+    ptr = viprules.next;
+    while ( rules_list.start_rule != 0 && ptr->next != &viprules )
+    {
+        ptr = ptr->next;
+        rules_list.start_rule--;
+    }
+
+    if ( rules_list.nb_rules == 0 )
+        return -EINVAL;
+
+    guest_rules = guest_handle_for_field(list_hnd, v4v_viptables_rule_t, rules[0]);
+
+    nbrules = 0;
+    while ( nbrules < rules_list.nb_rules && ptr != &viprules )
+    {
+        node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+        if ( !guest_handle_okay(guest_rules, 1) )
+            break;
+
+        if ( copy_to_guest(guest_rules, &node->rule, 1) )
+            break;
+
+        guest_handle_add_offset(guest_rules, 1);
+
+        nbrules++;
+        ptr = ptr->next;
+    }
+
+    rules_list.nb_rules = nbrules;
+    if ( copy_field_to_guest(list_hnd, &rules_list, nb_rules) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static size_t
+v4v_viptables_check(v4v_addr_t * src, v4v_addr_t * dst)
+{
+    struct list_head *ptr;
+    struct v4v_viptables_rule_node *node;
+    size_t ret = 0; /* Defaulting to ACCEPT */
+
+    read_lock(&viprules_lock);
+
+    list_for_each(ptr, &viprules)
+    {
+        node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+        if ( (node->rule.src.domain == V4V_DOMID_ANY ||
+              node->rule.src.domain == src->domain) &&
+             (node->rule.src.port == V4V_PORT_ANY ||
+              node->rule.src.port == src->port) &&
+             (node->rule.dst.domain == V4V_DOMID_ANY ||
+              node->rule.dst.domain == dst->domain) &&
+             (node->rule.dst.port == V4V_PORT_ANY ||
+              node->rule.dst.port == dst->port) )
+        {
+            ret = !node->rule.accept;
+            break;
+        }
+    }
+
+    read_unlock(&viprules_lock);
+    return ret;
+}
+
+/*
+ * Hypercall to do the send
+ */
+static long
+v4v_sendv(struct domain *src_d, v4v_addr_t * src_addr,
+          v4v_addr_t * dst_addr, uint32_t proto,
+          internal_v4v_iov_t iovs, size_t niov)
+{
+    struct domain *dst_d;
+    v4v_ring_id_t src_id;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    if ( !dst_addr )
+    {
+        v4v_dprintk("!dst_addr, EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock(&v4v_lock);
+    if ( !src_d->v4v )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!src_d->v4v, EINVAL\n");
+        return -EINVAL;
+    }
+
+    src_id.addr.port = src_addr->port;
+    src_id.addr.domain = src_d->domain_id;
+    src_id.partner = dst_addr->domain;
+
+    dst_d = get_domain_by_id(dst_addr->domain);
+    if ( !dst_d )
+    {
+        read_unlock(&v4v_lock);
+        v4v_dprintk("!dst_d, ECONNREFUSED\n");
+        return -ECONNREFUSED;
+    }
+
+    if ( v4v_viptables_check(src_addr, dst_addr) != 0 )
+    {
+        read_unlock(&v4v_lock);
+        gdprintk(XENLOG_WARNING,
+                 "V4V: VIPTables REJECTED %i:%i -> %i:%i\n",
+                 src_addr->domain, src_addr->port,
+                 dst_addr->domain, dst_addr->port);
+        return -ECONNREFUSED;
+    }
+
+    do
+    {
+        if ( !dst_d->v4v )
+        {
+            v4v_dprintk("dst_d->v4v, ECONNREFUSED\n");
+            ret = -ECONNREFUSED;
+            break;
+        }
+
+        read_lock(&dst_d->v4v->lock);
+        ring_info =
+            v4v_ring_find_info_by_addr(dst_d, dst_addr, src_addr->domain);
+
+        if ( !ring_info )
+        {
+            ret = -ECONNREFUSED;
+            v4v_dprintk(" !ring_info, ECONNREFUSED\n");
+        }
+        else
+        {
+            long len = v4v_iov_count(iovs, niov);
+
+            if ( len < 0 )
+            {
+                ret = len;
+                break;
+            }
+
+            spin_lock(&ring_info->lock);
+            ret =
+                v4v_ringbuf_insertv(dst_d, ring_info, &src_id, proto, iovs,
+                        niov, len);
+            if ( ret == -EAGAIN )
+            {
+                v4v_dprintk("v4v_ringbuf_insertv failed, EAGAIN\n");
+                /* Schedule a wake up on the event channel when space is there */
+                if ( v4v_pending_requeue(ring_info, src_d->domain_id, len) )
+                {
+                    v4v_dprintk("v4v_pending_requeue failed, ENOMEM\n");
+                    ret = -ENOMEM;
+                }
+            }
+            spin_unlock(&ring_info->lock);
+
+            if ( ret >= 0 )
+            {
+                v4v_signal_domain(dst_d);
+            }
+
+        }
+        read_unlock(&dst_d->v4v->lock);
+
+    }
+    while ( 0 );
+
+    put_domain(dst_d);
+    read_unlock(&v4v_lock);
+    return ret;
+}
+
+static void
+v4v_info(struct domain *d, v4v_info_t *info)
+{
+    read_lock(&d->v4v->lock);
+    info->ring_magic = V4V_RING_MAGIC;
+    info->data_magic = V4V_RING_DATA_MAGIC;
+    info->evtchn = d->v4v->evtchn_port;
+    read_unlock(&d->v4v->lock);
+}
+
+/*
+ * hypercall glue
+ */
+long
+do_v4v_op(int cmd, XEN_GUEST_HANDLE(void) arg1,
+          XEN_GUEST_HANDLE(void) arg2,
+          uint32_t arg3, uint32_t arg4)
+{
+    struct domain *d = current->domain;
+    long rc = -EFAULT;
+
+    v4v_dprintk("->do_v4v_op(%d,%p,%p,%d,%d)\n", cmd,
+                (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
+
+    domain_lock(d);
+    switch (cmd)
+    {
+        case V4VOP_register_ring:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
+                    guest_handle_cast(arg1, v4v_ring_t);
+                XEN_GUEST_HANDLE(xen_pfn_t) pfn_hnd =
+                    guest_handle_cast(arg2, xen_pfn_t);
+                uint32_t npage = arg3;
+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+                    goto out;
+                if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
+                    goto out;
+                rc = v4v_ring_add(d, ring_hnd, npage, pfn_hnd);
+                break;
+            }
+        case V4VOP_unregister_ring:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_t) ring_hnd =
+                    guest_handle_cast(arg1, v4v_ring_t);
+                if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+                    goto out;
+                rc = v4v_ring_remove(d, ring_hnd);
+                break;
+            }
+        case V4VOP_send:
+            {
+                uint32_t len = arg3;
+                uint32_t message_type = arg4;
+                v4v_iov_t iov;
+                internal_v4v_iov_t iovs;
+                XEN_GUEST_HANDLE(v4v_send_addr_t) addr_hnd =
+                    guest_handle_cast(arg1, v4v_send_addr_t);
+                v4v_send_addr_t addr;
+
+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest(&addr, addr_hnd, 1) )
+                    goto out;
+
+                iov.iov_base = (uint64_t)arg2.p; // FIXME
+                iov.iov_len = len;
+                iovs.xen_iov = &iov;
+                rc = v4v_sendv(d, &addr.src, &addr.dst, message_type, iovs, 1);
+                break;
+            }
+        case V4VOP_sendv:
+            {
+                internal_v4v_iov_t iovs;
+                uint32_t niov = arg3;
+                uint32_t message_type = arg4;
+                XEN_GUEST_HANDLE(v4v_send_addr_t) addr_hnd =
+                    guest_handle_cast(arg1, v4v_send_addr_t);
+                v4v_send_addr_t addr;
+
+                memset(&iovs, 0, sizeof (iovs));
+                iovs.guest_iov = guest_handle_cast(arg2, v4v_iov_t);
+
+                if ( unlikely(!guest_handle_okay(addr_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest(&addr, addr_hnd, 1) )
+                    goto out;
+
+                if ( unlikely(!guest_handle_okay(iovs.guest_iov, niov)) )
+                    goto out;
+
+                rc = v4v_sendv(d, &addr.src, &addr.dst, message_type, iovs, niov);
+                break;
+            }
+        case V4VOP_notify:
+            {
+                XEN_GUEST_HANDLE(v4v_ring_data_t) ring_data_hnd =
+                    guest_handle_cast(arg1, v4v_ring_data_t);
+                rc = v4v_notify(d, ring_data_hnd);
+                break;
+            }
+        case V4VOP_viptables_add:
+            {
+                uint32_t position = arg3;
+                XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                write_lock(&viprules_lock);
+                rc = v4v_viptables_add(d, rule_hnd, position);
+                write_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_viptables_del:
+            {
+                uint32_t position = arg3;
+                XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                write_lock(&viprules_lock);
+                rc = v4v_viptables_del(d, rule_hnd, position);
+                write_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_viptables_list:
+            {
+                XEN_GUEST_HANDLE(v4v_viptables_list_t) rules_list_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_list_t);
+                rc = -EPERM;
+                if ( !IS_PRIV(d) )
+                    goto out;
+
+                rc = -EFAULT;
+                if ( unlikely(!guest_handle_okay(rules_list_hnd, 1)) )
+                    goto out;
+
+                read_lock(&viprules_lock);
+                rc = v4v_viptables_list(d, rules_list_hnd);
+                read_unlock(&viprules_lock);
+                break;
+            }
+        case V4VOP_info:
+            {
+                XEN_GUEST_HANDLE(v4v_info_t) info_hnd =
+                    guest_handle_cast(arg1, v4v_info_t);
+                v4v_info_t info;
+
+                if ( unlikely(!guest_handle_okay(info_hnd, 1)) )
+                    goto out;
+                v4v_info(d, &info);
+                if ( copy_to_guest(info_hnd, &info, 1) )
+                    goto out;
+                rc = 0;
+                break;
+            }
+        default:
+            rc = -ENOSYS;
+            break;
+    }
+out:
+    domain_unlock(d);
+    v4v_dprintk("<-do_v4v_op()=%d\n", (int)rc);
+    return rc;
+}
+
+/*
+ * init
+ */
+
+void
+v4v_destroy(struct domain *d)
+{
+    int i;
+
+    BUG_ON(!d->is_dying);
+    write_lock(&v4v_lock);
+
+    v4v_dprintk("d->v=%p\n", d->v4v);
+
+    if ( d->v4v )
+    {
+        for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+        {
+            struct hlist_node *node, *next;
+            struct v4v_ring_info *ring_info;
+
+            hlist_for_each_entry_safe(ring_info, node,
+                    next, &d->v4v->ring_hash[i],
+                    node)
+            {
+                v4v_ring_remove_info(d, ring_info);
+            }
+        }
+    }
+
+    d->v4v = NULL;
+    write_unlock(&v4v_lock);
+}
+
+int
+v4v_init(struct domain *d)
+{
+    struct v4v_domain *v4v;
+    evtchn_port_t port;
+    int i;
+    int rc;
+
+    v4v = xmalloc(struct v4v_domain);
+    if ( !v4v )
+        return -ENOMEM;
+
+    rc = evtchn_alloc_unbound_domain(d, &port, d->domain_id);
+    if ( rc )
+        return rc;
+
+    rwlock_init(&v4v->lock);
+
+    v4v->evtchn_port = port;
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+        INIT_HLIST_HEAD(&v4v->ring_hash[i]);
+
+    write_lock(&v4v_lock);
+    d->v4v = v4v;
+    write_unlock(&v4v_lock);
+
+    return 0;
+}
+
+
+/*
+ * debug
+ */
+
+static void
+dump_domain_ring(struct domain *d, struct v4v_ring_info *ring_info)
+{
+    uint32_t rx_ptr;
+
+    printk(KERN_ERR "  ring: domid=%d port=0x%08x partner=%d npage=%d\n",
+           (int)d->domain_id, (int)ring_info->id.addr.port,
+           (int)ring_info->id.partner, (int)ring_info->npage);
+
+    if ( v4v_ringbuf_get_rx_ptr(d, ring_info, &rx_ptr) )
+    {
+        printk(KERN_ERR "   Failed to read rx_ptr\n");
+        return;
+    }
+
+    printk(KERN_ERR "   tx_ptr=%d rx_ptr=%d len=%d\n",
+           (int)ring_info->tx_ptr, (int)rx_ptr, (int)ring_info->len);
+}
+
+static void
+dump_domain(struct domain *d)
+{
+    int i;
+
+    printk(KERN_ERR " domain %d:\n", (int)d->domain_id);
+
+    read_lock(&d->v4v->lock);
+
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry(ring_info, node, &d->v4v->ring_hash[i], node)
+            dump_domain_ring(d, ring_info);
+    }
+
+    printk(KERN_ERR "  event channel: %d\n",  d->v4v->evtchn_port);
+    read_unlock(&d->v4v->lock);
+
+    printk(KERN_ERR "\n");
+    v4v_signal_domain(d);
+}
+
+static void
+dump_state(unsigned char key)
+{
+    struct domain *d;
+
+    printk(KERN_ERR "\n\nV4V:\n");
+    read_lock(&v4v_lock);
+
+    rcu_read_lock(&domlist_read_lock);
+
+    for_each_domain(d)
+        dump_domain(d);
+
+    rcu_read_unlock(&domlist_read_lock);
+
+    read_unlock(&v4v_lock);
+}
+
+struct keyhandler v4v_info_keyhandler =
+{
+    .diagnostic = 1,
+    .u.fn = dump_state,
+    .desc = "dump v4v states and interupt"
+};
+
+static int __init
+setup_dump_rings(void)
+{
+    register_keyhandler('4', &v4v_info_keyhandler);
+    return 0;
+}
+
+__initcall(setup_dump_rings);
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/v4v.h b/xen/include/public/v4v.h
new file mode 100644
index 0000000..ad8bb0e
--- /dev/null
+++ b/xen/include/public/v4v.h
@@ -0,0 +1,305 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __XEN_PUBLIC_V4V_H__
+#define __XEN_PUBLIC_V4V_H__
+
+#include "xen.h"
+#include "event_channel.h"
+
+/*
+ * Structure definitions
+ */
+
+#define V4V_RING_MAGIC          0xA822f72bb0b9d8ccULL
+#define V4V_RING_DATA_MAGIC	0x45fe852220b801d4ULL
+
+#define V4V_MESSAGE_DGRAM       0x3c2c1db8
+#define V4V_MESSAGE_STREAM 	0x70f6a8e5
+
+#define V4V_DOMID_ANY           DOMID_INVALID
+#define V4V_PORT_ANY            0
+
+typedef struct v4v_iov
+{
+    uint64_t iov_base;
+    uint32_t iov_len;
+} v4v_iov_t;
+
+typedef struct v4v_addr
+{
+    uint32_t port;
+    domid_t domain;
+    uint16_t pad;
+} v4v_addr_t;
+
+typedef struct v4v_ring_id
+{
+    v4v_addr_t addr;
+    domid_t partner;
+    uint16_t pad;
+} v4v_ring_id_t;
+
+typedef struct
+{
+    v4v_addr_t src;
+    v4v_addr_t dst;
+} v4v_send_addr_t;
+
+/*
+ * v4v_ring
+ * id: xen only looks at this during register/unregister
+ *     and will fill in id.addr.domain
+ * rx_ptr: rx pointer, modified by domain
+ * tx_ptr: tx pointer, modified by xen
+ *
+ */
+struct v4v_ring
+{
+    uint64_t magic;
+    v4v_ring_id_t id;
+    uint32_t len;
+    uint32_t rx_ptr;
+    uint32_t tx_ptr;
+    uint8_t reserved[32];
+    uint8_t ring[0];
+};
+typedef struct v4v_ring v4v_ring_t;
+
+#define V4V_RING_DATA_F_EMPTY       (1U << 0) /* Ring is empty */
+#define V4V_RING_DATA_F_EXISTS      (1U << 1) /* Ring exists */
+#define V4V_RING_DATA_F_PENDING     (1U << 2) /* Pending interrupt exists - do not
+                                               * rely on this field - for
+                                               * profiling only */
+#define V4V_RING_DATA_F_SUFFICIENT  (1U << 3) /* Sufficient space to queue
+                                               * space_required bytes exists */
+
+typedef struct v4v_ring_data_ent
+{
+    v4v_addr_t ring;
+    uint16_t flags;
+    uint16_t pad;
+    uint32_t space_required;
+    uint32_t max_message_size;
+} v4v_ring_data_ent_t;
+
+typedef struct v4v_ring_data
+{
+    uint64_t magic;
+    uint32_t nent;
+    uint32_t pad;
+    uint64_t reserved[4];
+    v4v_ring_data_ent_t data[0];
+} v4v_ring_data_t;
+
+struct v4v_info
+{
+    uint64_t ring_magic;
+    uint64_t data_magic;
+    evtchn_port_t evtchn;
+    uint32_t pad;
+};
+typedef struct v4v_info v4v_info_t;
+
+#define V4V_SHF_SYN		(1 << 0)
+#define V4V_SHF_ACK		(1 << 1)
+#define V4V_SHF_RST		(1 << 2)
+
+#define V4V_SHF_PING		(1 << 8)
+#define V4V_SHF_PONG		(1 << 9)
+
+struct v4v_stream_header
+{
+    uint32_t flags;
+    uint32_t conid;
+};
+
+struct v4v_ring_message_header
+{
+    uint32_t len;
+    uint32_t pad0;
+    v4v_addr_t source;
+    uint32_t message_type;
+    uint32_t pad1;
+    uint8_t data[0];
+};
+
+typedef struct v4v_viptables_rule
+{
+    v4v_addr_t src;
+    v4v_addr_t dst;
+    uint32_t accept;
+    uint32_t pad;
+} v4v_viptables_rule_t;
+
+typedef struct v4v_viptables_list
+{
+    uint32_t start_rule;
+    uint32_t nb_rules;
+    struct v4v_viptables_rule rules[0];
+} v4v_viptables_list_t;
+
+/*
+ * HYPERCALLS
+ */
+
+/*
+ * V4VOP_register_ring
+ *
+ * Registers a ring with Xen, if a ring with the same v4v_ring_id exists,
+ * this ring takes its place, registration will not change tx_ptr
+ * unless it is invalid
+ *
+ * do_v4v_op(V4VOP_unregister_ring,
+ *           v4v_ring, XEN_GUEST_HANDLE(xen_pfn_t),
+ *           npage, 0)
+ */
+#define V4VOP_register_ring 	1
+
+
+/*
+ * V4VOP_unregister_ring
+ *
+ * Unregister a ring.
+ *
+ * do_v4v_op(V4VOP_send, v4v_ring, NULL, 0, 0)
+ */
+#define V4VOP_unregister_ring 	2
+
+/*
+ * V4VOP_send
+ *
+ * Sends len bytes of buf to dst, giving src as the source address (xen will
+ * ignore src->domain and put your domain in the actually message), xen
+ * first looks for a ring with id.addr==dst and id.partner==sending_domain
+ * if that fails it looks for id.addr==dst and id.partner==DOMID_ANY.
+ * message_type is the 32 bit number used from the message
+ * most likely V4V_MESSAGE_DGRAM or V4V_MESSAGE_STREAM. If insufficient space exists
+ * it will return -EAGAIN and xen will twing the V4V_INTERRUPT when
+ * sufficient space becomes available
+ *
+ * do_v4v_op(V4VOP_send,
+ *           v4v_send_addr_t addr,
+ *           void* buf,
+ *           uint32_t len,
+ *           uint32_t message_type)
+ */
+#define V4VOP_send 		3
+
+
+/*
+ * V4VOP_notify
+ *
+ * Asks xen for information about other rings in the system
+ *
+ * ent->ring is the v4v_addr_t of the ring you want information on
+ * the same matching rules are used as for V4VOP_send.
+ *
+ * ent->space_required  if this field is not null xen will check
+ * that there is space in the destination ring for this many bytes
+ * of payload. If there is it will set the V4V_RING_DATA_F_SUFFICIENT
+ * and CANCEL any pending interrupt for that ent->ring, if insufficient
+ * space is available it will schedule an interrupt and the flag will
+ * not be set.
+ *
+ * The flags are set by xen when notify replies
+ * V4V_RING_DATA_F_EMPTY	ring is empty
+ * V4V_RING_DATA_F_PENDING	interrupt is pending - don't rely on this
+ * V4V_RING_DATA_F_SUFFICIENT	sufficient space for space_required is there
+ * V4V_RING_DATA_F_EXISTS	ring exists
+ *
+ * do_v4v_op(V4VOP_notify,
+ *           XEN_GUEST_HANDLE(v4v_ring_data_ent) ent,
+ *           NULL, nent, 0)
+ */
+#define V4VOP_notify 		4
+
+/*
+ * V4VOP_sendv
+ *
+ * Identical to V4VOP_send except rather than buf and len it takes
+ * an array of v4v_iov and a length of the array.
+ *
+ * do_v4v_op(V4VOP_sendv,
+ *           v4v_send_addr_t addr,
+ *           v4v_iov iov,
+ *           uint32_t niov,
+ *           uint32_t message_type)
+ */
+#define V4VOP_sendv		5
+
+/*
+ * V4VOP_viptables_add
+ *
+ * Insert a filtering rules after a given position.
+ *
+ * do_v4v_op(V4VOP_viptables_add,
+ *           v4v_viptables_rule_t rule,
+ *           NULL,
+ *           uint32_t position, 0)
+ */
+#define V4VOP_viptables_add     6
+
+/*
+ * V4VOP_viptables_del
+ *
+ * Delete a filtering rules at a position or the rule
+ * that matches "rule".
+ *
+ * do_v4v_op(V4VOP_viptables_del,
+ *           v4v_viptables_rule_t rule,
+ *           NULL,
+ *           uint32_t position, 0)
+ */
+#define V4VOP_viptables_del     7
+
+/*
+ * V4VOP_viptables_list
+ *
+ * Delete a filtering rules at a position or the rule
+ * that matches "rule".
+ *
+ * do_v4v_op(V4VOP_viptables_list,
+ *               v4v_vitpables_list_t list,
+ *               NULL, 0, 0)
+ */
+#define V4VOP_viptables_list    8
+
+/*
+ * V4VOP_info
+ *
+ * Returns v4v info for the current domain (domain that issued the hypercall).
+ *      - V4V magic number
+ *      - event channel port (for current domain)
+ *
+ * do_v4v_op(V4VOP_info,
+ *           XEN_GUEST_HANDLE(v4v_info_t) info,
+ *           NULL, 0, 0)
+ */
+#define V4VOP_info              9
+
+#endif /* __XEN_PUBLIC_V4V_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index b19425b..868d119 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -99,7 +99,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
 #define __HYPERVISOR_domctl               36
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
-#define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_v4v_op               39
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 53804c8..296de52 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -23,6 +23,7 @@
 #include <public/sysctl.h>
 #include <public/vcpu.h>
 #include <public/mem_event.h>
+#include <xen/v4v.h>
 
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
@@ -350,6 +351,9 @@ struct domain
     nodemask_t node_affinity;
     unsigned int last_alloc_node;
     spinlock_t node_affinity_lock;
+
+    /* v4v */
+    struct v4v_domain *v4v;
 };
 
 struct domain_setup_info
diff --git a/xen/include/xen/v4v.h b/xen/include/xen/v4v.h
new file mode 100644
index 0000000..05d20b5
--- /dev/null
+++ b/xen/include/xen/v4v.h
@@ -0,0 +1,133 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __V4V_PRIVATE_H__
+#define __V4V_PRIVATE_H__
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/smp.h>
+#include <xen/shared.h>
+#include <xen/list.h>
+#include <public/v4v.h>
+
+#define V4V_HTABLE_SIZE 32
+
+/*
+ * Handlers
+ */
+
+DEFINE_XEN_GUEST_HANDLE (v4v_iov_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_addr_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_send_addr_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_ent_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_info_t);
+
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_rule_t);
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_list_t);
+
+/*
+ * Helper functions
+ */
+
+static inline uint16_t
+v4v_hash_fn (v4v_ring_id_t *id)
+{
+    uint16_t ret;
+    ret = (uint16_t) (id->addr.port >> 16);
+    ret ^= (uint16_t) id->addr.port;
+    ret ^= id->addr.domain;
+    ret ^= id->partner;
+
+    ret &= (V4V_HTABLE_SIZE-1);
+
+    return ret;
+}
+
+struct v4v_pending_ent
+{
+    struct hlist_node node;
+    domid_t id;
+    uint32_t len;
+};
+
+
+struct v4v_ring_info
+{
+    /* next node in the hash, protected by L2  */
+    struct hlist_node node;
+    /* this ring's id, protected by L2 */
+    v4v_ring_id_t id;
+    /* L3 */
+    spinlock_t lock;
+    /* cached length of the ring (from ring->len), protected by L3 */
+    uint32_t len;
+    uint32_t npage;
+    /* cached tx pointer location, protected by L3 */
+    uint32_t tx_ptr;
+    /* guest ring, protected by L3 */
+    XEN_GUEST_HANDLE(v4v_ring_t) ring;
+    /* mapped ring pages protected by L3*/
+    uint8_t **mfn_mapping;
+    /* list of mfns of guest ring */
+    mfn_t *mfns;
+    /* list of struct v4v_pending_ent for this ring, L3 */
+    struct hlist_head pending;
+};
+
+/*
+ * The value of the v4v element in a struct domain is
+ * protected by the global lock L1
+ */
+struct v4v_domain
+{
+    /* L2 */
+    rwlock_t lock;
+    /* event channel */
+    evtchn_port_t evtchn_port;
+    /* protected by L2 */
+    struct hlist_head ring_hash[V4V_HTABLE_SIZE];
+};
+
+typedef struct v4v_viptables_rule_node
+{
+    struct list_head list;
+    v4v_viptables_rule_t rule;
+} v4v_viptables_rule_node_t;
+
+void v4v_destroy(struct domain *d);
+int v4v_init(struct domain *d);
+long do_v4v_op (int cmd,
+                XEN_GUEST_HANDLE (void) arg1,
+                XEN_GUEST_HANDLE (void) arg2,
+                uint32_t arg3,
+                uint32_t arg4);
+
+#endif /* __V4V_PRIVATE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-10-08 10:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-20 11:41 [PATCH 0/4] Add V4V to Xen (v6) Jean Guyader
2012-09-20 11:41 ` [PATCH 1/4] xen: Introduce guest_handle_for_field Jean Guyader
2012-09-20 11:42 ` [PATCH 2/4] xen: virq, remove VIRQ_XC_RESERVED Jean Guyader
2012-09-20 11:42 ` [PATCH 3/4] xen: events, exposes evtchn_alloc_unbound_domain Jean Guyader
2012-09-20 11:42 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
2012-09-20 12:20   ` Jan Beulich
2012-10-04 12:03     ` Jean Guyader
2012-10-04 12:11       ` Jan Beulich
2012-10-04 12:41         ` Jean Guyader
2012-10-04 15:03         ` Jean Guyader
2012-10-04 15:12           ` Tim Deegan
2012-10-04 17:00             ` Jean Guyader
2012-10-05  8:52               ` Jan Beulich
2012-10-05  9:07                 ` Jan Beulich
2012-10-08 10:24                   ` Jean Guyader
     [not found]         ` <CAEBdQ92Box+2PMohpDi6Sy=PXFmP_sxYb0ZYSOCU4U3iPdocqQ@mail.gmail.com>
2012-10-04 15:07           ` Jan Beulich
2012-09-22 13:47   ` Julian Pidancet
2012-09-27  9:37     ` Jean Guyader
  -- strict thread matches above, loose matches on Subject: below --
2012-09-20  9:47 [PATCH 0/4] Add V4V to Xen v5 Jean Guyader
2012-09-20  9:47 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
2012-09-13 18:09 [RFC][PATCH 0/4] Add V4V to Xen Jean Guyader
2012-09-13 18:09 ` [PATCH 4/4] xen: Add V4V implementation Jean Guyader
2012-09-14  9:54   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.