All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] RFC: V4V (v2)
@ 2012-06-28 16:26 Jean Guyader
  2012-06-28 16:26 ` [PATCH 1/5] xen: add ssize_t Jean Guyader
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Jean Guyader @ 2012-06-28 16:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Jean Guyader

[-- Attachment #1: Type: text/plain, Size: 1612 bytes --]

V4V is a copy based inter vm communication system.

Please have a look at this thread for more detail
about V4V.
http://lists.xen.org/archives/html/xen-devel/2012-05/msg01866.html

This patch series is work in progress but I wanted to
post it early enough so I can get feedback from people.

v2 changes:
  - Cleanup plugin header
  - Include basic access control
  - Use guest_handle_for_field
changes requested not a v2:
  - Switch to event channel instead of virq

Jean Guyader (5):
  xen: add ssize_t
  v4v: Introduce VIRQ_V4V
  xen: Enforce introduce guest_handle_for_field
  xen: Add V4V implementation
  v4v: Introduce basic access control to V4V

 xen/arch/x86/hvm/hvm.c             |    9 +-
 xen/arch/x86/x86_32/entry.S        |    2 +
 xen/arch/x86/x86_64/compat/entry.S |    2 +
 xen/arch/x86/x86_64/entry.S        |    2 +
 xen/common/Makefile                |    1 +
 xen/common/domain.c                |   11 +-
 xen/common/event_channel.c         |    1 +
 xen/common/v4v.c                   | 2020 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/types.h        |    1 +
 xen/include/asm-x86/guest_access.h |    3 +
 xen/include/asm-x86/types.h        |    6 +
 xen/include/public/v4v.h           |  243 +++++
 xen/include/public/xen.h           |    4 +-
 xen/include/xen/sched.h            |    5 +
 xen/include/xen/v4v.h              |  213 ++++
 15 files changed, 2517 insertions(+), 6 deletions(-)
 create mode 100644 xen/common/v4v.c
 create mode 100644 xen/include/public/v4v.h
 create mode 100644 xen/include/xen/v4v.h

-- 
1.7.9.5


[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/5] xen: add ssize_t
  2012-06-28 16:26 [PATCH 0/5] RFC: V4V (v2) Jean Guyader
@ 2012-06-28 16:26 ` Jean Guyader
  2012-06-29  8:05   ` Jan Beulich
  2012-06-28 16:26 ` [PATCH 2/5] v4v: Introduce VIRQ_V4V Jean Guyader
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Jean Guyader @ 2012-06-28 16:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Jean Guyader

[-- Attachment #1: Type: text/plain, Size: 182 bytes --]


Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/include/asm-arm/types.h |    1 +
 xen/include/asm-x86/types.h |    6 ++++++
 2 files changed, 7 insertions(+)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-xen-add-ssize_t.patch --]
[-- Type: text/x-patch; name="0001-xen-add-ssize_t.patch", Size: 852 bytes --]

diff --git a/xen/include/asm-arm/types.h b/xen/include/asm-arm/types.h
index 48864f9..d2c5612 100644
--- a/xen/include/asm-arm/types.h
+++ b/xen/include/asm-arm/types.h
@@ -35,6 +35,7 @@ typedef u64 paddr_t;
 #define PRIpaddr "016llx"
 
 typedef unsigned long size_t;
+typedef long ssize_t;
 
 typedef char bool_t;
 #define test_and_set_bool(b)   xchg(&(b), 1)
diff --git a/xen/include/asm-x86/types.h b/xen/include/asm-x86/types.h
index 1c4c5d5..bb7ffc2 100644
--- a/xen/include/asm-x86/types.h
+++ b/xen/include/asm-x86/types.h
@@ -59,6 +59,12 @@ typedef char bool_t;
 #define test_and_set_bool(b)   xchg(&(b), 1)
 #define test_and_clear_bool(b) xchg(&(b), 0)
 
+#if defined(__i386__)
+typedef int ssize_t;
+#else /* __x86_64 */
+typedef long ssize_t;
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __X86_TYPES_H__ */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/5] v4v: Introduce VIRQ_V4V
  2012-06-28 16:26 [PATCH 0/5] RFC: V4V (v2) Jean Guyader
  2012-06-28 16:26 ` [PATCH 1/5] xen: add ssize_t Jean Guyader
@ 2012-06-28 16:26 ` Jean Guyader
  2012-06-29  8:07   ` Jan Beulich
  2012-06-28 16:26 ` [PATCH 3/5] xen: Enforce introduce guest_handle_for_field Jean Guyader
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Jean Guyader @ 2012-06-28 16:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Jean Guyader

[-- Attachment #1: Type: text/plain, Size: 229 bytes --]


Introduce the global virq VIRQ_V4V

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/common/event_channel.c |    1 +
 xen/include/public/xen.h   |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-v4v-Introduce-VIRQ_V4V.patch --]
[-- Type: text/x-patch; name="0002-v4v-Introduce-VIRQ_V4V.patch", Size: 1148 bytes --]

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 53777f8..e138600 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -107,6 +107,7 @@ static int virq_is_global(uint32_t virq)
     case VIRQ_TIMER:
     case VIRQ_DEBUG:
     case VIRQ_XENOPROF:
+    case VIRQ_V4V:
         rc = 0;
         break;
     case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index b2f6c50..033cbba 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -157,7 +157,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
 #define VIRQ_CON_RING   8  /* G. (DOM0) Bytes received on console            */
 #define VIRQ_PCPU_STATE 9  /* G. (DOM0) PCPU state changed                   */
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
-#define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
+#define VIRQ_V4V        11 /* G. V4V event has occurred                      */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
 
 /* Architecture-specific VIRQ definitions. */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/5] xen: Enforce introduce guest_handle_for_field
  2012-06-28 16:26 [PATCH 0/5] RFC: V4V (v2) Jean Guyader
  2012-06-28 16:26 ` [PATCH 1/5] xen: add ssize_t Jean Guyader
  2012-06-28 16:26 ` [PATCH 2/5] v4v: Introduce VIRQ_V4V Jean Guyader
@ 2012-06-28 16:26 ` Jean Guyader
  2012-06-29  8:10   ` Jan Beulich
  2012-06-28 16:26 ` [PATCH 4/5] xen: Add V4V implementation Jean Guyader
  2012-06-28 16:26 ` [PATCH 5/5] v4v: Introduce basic access control to V4V Jean Guyader
  4 siblings, 1 reply; 29+ messages in thread
From: Jean Guyader @ 2012-06-28 16:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Jean Guyader

[-- Attachment #1: Type: text/plain, Size: 213 bytes --]


This helper turns a field of a GUEST_HANDLE in
a GUEST_HANDLE.

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/include/asm-x86/guest_access.h |    3 +++
 1 file changed, 3 insertions(+)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-xen-Enforce-introduce-guest_handle_for_field.patch --]
[-- Type: text/x-patch; name="0003-xen-Enforce-introduce-guest_handle_for_field.patch", Size: 547 bytes --]

diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index 2b429c2..e3ac1d6 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -51,6 +51,9 @@
     (XEN_GUEST_HANDLE(type)) { _x };            \
 })
 
+#define guest_handle_for_field(hnd, type, fld)          \
+    ((XEN_GUEST_HANDLE(type)) { &(hnd).p->fld })
+
 #define guest_handle_from_ptr(ptr, type)        \
     ((XEN_GUEST_HANDLE(type)) { (type *)ptr })
 #define const_guest_handle_from_ptr(ptr, type)  \

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/5] xen: Add V4V implementation
  2012-06-28 16:26 [PATCH 0/5] RFC: V4V (v2) Jean Guyader
                   ` (2 preceding siblings ...)
  2012-06-28 16:26 ` [PATCH 3/5] xen: Enforce introduce guest_handle_for_field Jean Guyader
@ 2012-06-28 16:26 ` Jean Guyader
  2012-06-29  8:33   ` Jan Beulich
  2012-07-05 11:36   ` Tim Deegan
  2012-06-28 16:26 ` [PATCH 5/5] v4v: Introduce basic access control to V4V Jean Guyader
  4 siblings, 2 replies; 29+ messages in thread
From: Jean Guyader @ 2012-06-28 16:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Jean Guyader

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]


Setup of v4v domains a domain gets created and cleanup
when a domain die. Wire up the v4v hypercall.

Include v4v internal and public headers.

Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/arch/x86/hvm/hvm.c             |    9 +-
 xen/arch/x86/x86_32/entry.S        |    2 +
 xen/arch/x86/x86_64/compat/entry.S |    2 +
 xen/arch/x86/x86_64/entry.S        |    2 +
 xen/common/Makefile                |    1 +
 xen/common/domain.c                |   11 +-
 xen/common/v4v.c                   | 1755 ++++++++++++++++++++++++++++++++++++
 xen/include/public/v4v.h           |  240 +++++
 xen/include/public/xen.h           |    2 +-
 xen/include/xen/sched.h            |    5 +
 xen/include/xen/v4v.h              |  187 ++++
 11 files changed, 2211 insertions(+), 5 deletions(-)
 create mode 100644 xen/common/v4v.c
 create mode 100644 xen/include/public/v4v.h
 create mode 100644 xen/include/xen/v4v.h


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-xen-Add-V4V-implementation.patch --]
[-- Type: text/x-patch; name="0004-xen-Add-V4V-implementation.patch", Size: 66368 bytes --]

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e0d495d..6f2d70e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3124,7 +3124,8 @@ static hvm_hypercall_t *hvm_hypercall32_table[NR_hypercalls] = {
     HYPERCALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #else /* defined(__x86_64__) */
@@ -3209,7 +3210,8 @@ static hvm_hypercall_t *hvm_hypercall64_table[NR_hypercalls] = {
     HYPERCALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #define COMPAT_CALL(x)                                        \
@@ -3226,7 +3228,8 @@ static hvm_hypercall_t *hvm_hypercall32_table[NR_hypercalls] = {
     COMPAT_CALL(set_timer_op),
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(tmem_op)
+    HYPERCALL(tmem_op),
+    HYPERCALL(v4v_op)
 };
 
 #endif /* defined(__x86_64__) */
diff --git a/xen/arch/x86/x86_32/entry.S b/xen/arch/x86/x86_32/entry.S
index 2982679..b3e0da4 100644
--- a/xen/arch/x86/x86_32/entry.S
+++ b/xen/arch/x86/x86_32/entry.S
@@ -700,6 +700,7 @@ ENTRY(hypercall_table)
         .long do_domctl
         .long do_kexec_op
         .long do_tmem_op
+        .long do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/4)
         .long do_ni_hypercall
         .endr
@@ -748,6 +749,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec_op          */
         .byte 1 /* do_tmem_op           */
+        .byte 6 /* do_v4v_op		*/
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index f49ff2d..28615f9 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -414,6 +414,7 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -462,6 +463,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 6 /* do_v4v_op		    */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 3836260..918fa59 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -699,6 +699,7 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_v4v_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -747,6 +748,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 6 /* do_v4v_op		*/
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 9eba8bc..fe3c72c 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -45,6 +45,7 @@ obj-y += tmem_xen.o
 obj-y += radix-tree.o
 obj-y += rbtree.o
 obj-y += lzo.o
+obj-y += v4v.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo,$(n).init.o)
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 8840202..9539d88 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -195,7 +195,8 @@ struct domain *domain_create(
 {
     struct domain *d, **pd;
     enum { INIT_xsm = 1u<<0, INIT_watchdog = 1u<<1, INIT_rangeset = 1u<<2,
-           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5 };
+           INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5,
+           INIT_v4v = 1u<<6 };
     int init_status = 0;
     int poolid = CPUPOOLID_NONE;
 
@@ -219,6 +220,7 @@ struct domain *domain_create(
     spin_lock_init(&d->hypercall_deadlock_mutex);
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
+    rwlock_init(&d->v4v_lock);
 
     spin_lock_init(&d->node_affinity_lock);
     d->node_affinity = NODE_MASK_ALL;
@@ -274,6 +276,10 @@ struct domain *domain_create(
             goto fail;
         init_status |= INIT_gnttab;
 
+        if ( v4v_init(d) != 0 )
+            goto fail;
+        init_status |= INIT_v4v;
+
         poolid = 0;
 
         d->mem_event = xzalloc(struct mem_event_per_domain);
@@ -313,6 +319,8 @@ struct domain *domain_create(
     xfree(d->mem_event);
     if ( init_status & INIT_arch )
         arch_domain_destroy(d);
+    if ( init_status & INIT_v4v )
+	v4v_destroy(d);
     if ( init_status & INIT_gnttab )
         grant_table_destroy(d);
     if ( init_status & INIT_evtchn )
@@ -466,6 +474,7 @@ int domain_kill(struct domain *d)
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
         spin_barrier(&d->domain_lock);
+        v4v_destroy(d);
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
diff --git a/xen/common/v4v.c b/xen/common/v4v.c
new file mode 100644
index 0000000..e589fda
--- /dev/null
+++ b/xen/common/v4v.c
@@ -0,0 +1,1755 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/config.h>
+#include <xen/mm.h>
+#include <xen/compat.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/v4v.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <xen/keyhandler.h>
+#include <xen/v4v_utils.h>
+
+#ifdef V4V_DEBUG
+#define MY_FILE "v4v.c"
+#define v4v_dprintk(format, args...)                    \
+    do {                                                \
+        printk("%s:%d " format,                         \
+               MY_FILE, __LINE__, ## args );            \
+    } while ( 1 == 0 )
+#else
+#define v4v_dprintk(format, ... ) (void)0
+#endif
+
+
+
+DEFINE_XEN_GUEST_HANDLE (uint8_t);
+static struct v4v_ring_info *v4v_ring_find_info (struct domain *d,
+                                                 struct v4v_ring_id *id);
+
+static struct v4v_ring_info *v4v_ring_find_info_by_addr (struct domain *d,
+                                                         struct v4v_addr *a,
+                                                         domid_t p);
+
+/*
+ * locks
+ */
+
+/*
+ * locking is organized as follows:
+ *
+ * the global lock v4v_lock: L1 protects the v4v elements
+ * of all struct domain *d in the system, it does not
+ * protect any of the elements of d->v4v, just their
+ * addresses. By extension since the destruction of
+ * a domain with a non-NULL d->v4v will need to free
+ * the d->v4v pointer, holding this lock gauruntees
+ * that no domains pointers in which v4v is interested
+ * become invalid whilst this lock is held.
+ */
+
+static DEFINE_RWLOCK (v4v_lock); /* L1 */
+
+/*
+ * the lock d->v4v->lock: L2:  Read on protects the hash table and
+ * the elements in the hash_table d->v4v->ring_hash, and
+ * the node and id fields in struct v4v_ring_info in the
+ * hash table. Write on L2 protects all of the elements of
+ * struct v4v_ring_info. To take L2 you must already have R(L1)
+ * W(L1) implies W(L2) and L3
+ *
+ * the lock v4v_ring_info *ringinfo; ringinfo->lock: L3:
+ * protects len,tx_ptr the guest ring, the
+ * guest ring_data and the pending list. To take L3 you must
+ * already have R(L2). W(L2) implies L3
+ */
+
+
+/*
+ * Debugs
+ */
+
+#ifdef V4V_DEBUG
+static void
+v4v_hexdump (void *_p, int len)
+{
+    uint8_t *buf = (uint8_t *) _p;
+    int i, j;
+
+    for (i = 0; i < len; i += 16)
+    {
+        printk (KERN_ERR "%p:", &buf[i]);
+        for (j = 0; j < 16; ++j)
+        {
+            int k = i + j;
+            if (k < len)
+                printk (" %02x", buf[k]);
+            else
+                printk ("   ");
+        }
+        printk (" ");
+
+        for (j = 0; j < 16; ++j)
+        {
+            int k = i + j;
+            if (k < len)
+                printk ("%c", ((buf[k] > 32) && (buf[k] < 127)) ? buf[k] : '.');
+            else
+                printk (" ");
+        }
+        printk ("\n");
+    }
+}
+#endif
+
+
+/*
+ * Event channel
+ */
+
+static void
+v4v_signal_domain (struct domain *d)
+{
+    v4v_dprintk("send guest VIRQ_V4V domid:%d\n", d->domain_id);
+    send_guest_vcpu_virq (d->vcpu[0], VIRQ_V4V);
+}
+
+static void
+v4v_signal_domid (domid_t id)
+{
+    struct domain *d = get_domain_by_id (id);
+    if (!d)
+        return;
+    v4v_signal_domain (d);
+    put_domain (d);
+}
+
+
+/*
+ * ring buffer
+ */
+
+/* called must have L3 */
+static void
+v4v_ring_unmap (struct v4v_ring_info *ring_info)
+{
+    int i;
+    for (i = 0; i < ring_info->npage; ++i)
+    {
+        if (!ring_info->mfn_mapping[i])
+            continue;
+        v4v_dprintk("");
+        v4v_dprintk("unmapping page %p from %p\n",
+                (void*) mfn_x (ring_info->mfns[i]),
+                ring_info->mfn_mapping[i]);
+
+        unmap_domain_page (ring_info->mfn_mapping[i]);
+        ring_info->mfn_mapping[i] = NULL;
+    }
+}
+
+/* called must have L3 */
+static uint8_t *
+v4v_ring_map_page (struct v4v_ring_info *ring_info, int i)
+{
+    if (i >= ring_info->npage)
+        return NULL;
+    if (ring_info->mfn_mapping[i])
+        return ring_info->mfn_mapping[i];
+    ring_info->mfn_mapping[i] = map_domain_page (mfn_x (ring_info->mfns[i]));
+
+    v4v_dprintk("mapping page %p to %p\n",
+            (void *) mfn_x (ring_info->mfns[i]),
+            ring_info->mfn_mapping[i]);
+    return ring_info->mfn_mapping[i];
+}
+
+/* called must have L3 */
+static int
+v4v_memcpy_from_guest_ring (void *_dst, struct v4v_ring_info *ring_info,
+                            uint32_t offset, uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *src;
+    uint8_t *dst = _dst;
+
+    offset &= PAGE_SIZE - 1;
+
+    while ((offset + len) > PAGE_SIZE)
+    {
+        src = v4v_ring_map_page (ring_info, page);
+
+        if (!src)
+        {
+            return -EFAULT;
+        }
+
+        v4v_dprintk("memcpy(%p,%p+%d,%d)\n",
+                dst, src, offset,
+                (int) (PAGE_SIZE - offset));
+        memcpy (dst, src + offset, PAGE_SIZE - offset);
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        dst += PAGE_SIZE - offset;
+        offset = 0;
+    }
+
+    src = v4v_ring_map_page (ring_info, page);
+    if (!src)
+    {
+        return -EFAULT;
+    }
+
+    v4v_dprintk("memcpy(%p,%p+%d,%d)\n", dst, src, offset, len);
+    memcpy (dst, src + offset, len);
+
+    return 0;
+}
+
+
+/* called must have L3 */
+static int
+v4v_update_tx_ptr (struct v4v_ring_info *ring_info, uint32_t tx_ptr)
+{
+    uint8_t *dst = v4v_ring_map_page (ring_info, 0);
+    volatile uint32_t *p = (uint32_t *)(dst + offsetof (v4v_ring_t, tx_ptr));
+
+    if (!dst)
+        return -EFAULT;
+    *p = tx_ptr;
+    return 0;
+}
+
+/* called must have L3 */
+static int
+v4v_memcpy_to_guest_ring (struct v4v_ring_info *ring_info, uint32_t offset,
+        void *_src, uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *dst;
+    uint8_t *src = _src;
+
+    offset &= PAGE_SIZE - 1;
+
+    while ((offset + len) > PAGE_SIZE)
+    {
+        dst = v4v_ring_map_page (ring_info, page);
+
+        if (!dst)
+        {
+            v4v_dprintk("!dst\n");
+            return -EFAULT;
+        }
+
+#ifdef V4V_DEBUG
+        v4v_dprintk("memcpy(%p+%d,%p,%d)\n",
+                dst, offset, src,
+                (int) (PAGE_SIZE - offset));
+        v4v_hexdump (src, PAGE_SIZE - offset);
+        v4v_hexdump (dst + offset, PAGE_SIZE - offset);
+#endif
+        memcpy (dst + offset, src, PAGE_SIZE - offset);
+
+        page++;
+        len -= (PAGE_SIZE - offset);
+        src += (PAGE_SIZE - offset);
+        offset = 0;
+    }
+
+    dst = v4v_ring_map_page (ring_info, page);
+
+    if (!dst)
+    {
+        v4v_dprintk("attempted to map page %d of %d\n", page, ring_info->npage);
+        return -EFAULT;
+    }
+
+#ifdef V4V_DEBUG
+    v4v_dprintk("memcpy(%p+%d,%p,%d)\n",
+            dst, offset, src, len);
+    v4v_hexdump (src, len);
+    v4v_hexdump (dst + offset, len);
+#endif
+    memcpy (dst + offset, src, len);
+
+    return 0;
+}
+
+/*called must have L3*/
+static int
+v4v_memcpy_to_guest_ring_from_guest(struct v4v_ring_info *ring_info,
+                                    uint32_t offset,
+                                    XEN_GUEST_HANDLE (uint8_t) src_hnd,
+                                    uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *dst;
+
+    offset &= PAGE_SIZE - 1;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        dst = v4v_ring_map_page (ring_info, page);
+
+        if ( !dst )
+        {
+            v4v_dprintk("!dst\n");
+            return -EFAULT;
+        }
+
+        v4v_dprintk("copy_from_guest(%p+%d,%p,%d)\n",
+                    dst, offset, (void *) src_hnd.p,
+                    (int) (PAGE_SIZE - offset));
+        if ( copy_from_guest ((dst + offset), src_hnd, PAGE_SIZE - offset) )
+        {
+            v4v_dprintk("copy_from_guest failed\n");
+            return -EFAULT;
+        }
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        guest_handle_add_offset (src_hnd, PAGE_SIZE - offset);
+        offset = 0;
+    }
+
+    dst = v4v_ring_map_page (ring_info, page);
+    if (!dst)
+    {
+        v4v_dprintk("v4v_ring_map failed\n");
+        return -EFAULT;
+    }
+
+    v4v_dprintk("copy_from_guest(%p+%d,%p,%d)\n",
+                dst, offset, (void *) src_hnd.p, len);
+    if ( copy_from_guest ((dst + offset), src_hnd, len) )
+    {
+        v4v_dprintk("copy_from_guest failed\n");
+        return -EFAULT;
+    }
+
+    return 0;
+}
+
+static int
+v4v_ringbuf_get_rx_ptr (struct domain *d, struct v4v_ring_info *ring_info,
+                        uint32_t * rx_ptr)
+{
+    v4v_ring_t *ringp;
+
+    if ( ring_info->npage == 0 )
+        return -1;
+
+    ringp = map_domain_page (mfn_x (ring_info->mfns[0]));
+
+    v4v_dprintk("v4v_ringbuf_payload_space: mapped %p to %p\n",
+                (void *) mfn_x (ring_info->mfns[0]), ringp);
+    if ( !ringp )
+        return -1;
+
+    *rx_ptr = *(volatile uint32_t *) &ringp->rx_ptr;
+
+    unmap_domain_page (mfn_x (ring_info->mfns[0]));
+    return 0;
+}
+
+uint32_t
+v4v_ringbuf_payload_space (struct domain * d, struct v4v_ring_info * ring_info)
+{
+    v4v_ring_t ring;
+    int32_t ret;
+
+    ring.tx_ptr = ring_info->tx_ptr;
+    ring.len = ring_info->len;
+
+    if ( v4v_ringbuf_get_rx_ptr (d, ring_info, &ring.rx_ptr) )
+        return 0;
+
+    v4v_dprintk("v4v_ringbuf_payload_space:tx_ptr=%d rx_ptr=%d\n",
+                (int) ring.tx_ptr, (int) ring.rx_ptr);
+    if ( ring.rx_ptr == ring.tx_ptr )
+        return ring.len - sizeof (struct v4v_ring_message_header);
+
+    ret = ring.rx_ptr - ring.tx_ptr;
+    if ( ret < 0 )
+        ret += ring.len;
+
+    ret -= sizeof (struct v4v_ring_message_header);
+    ret -= V4V_ROUNDUP (1);
+
+    return (ret < 0) ? 0 : ret;
+}
+
+/*caller must have L3*/
+static size_t
+v4v_ringbuf_insert (struct domain *d,
+                    struct v4v_ring_info *ring_info,
+                    struct v4v_ring_id *src_id, uint32_t proto,
+                    XEN_GUEST_HANDLE (void) buf_hnd_void, uint32_t len)
+{
+    XEN_GUEST_HANDLE (uint8_t) buf_hnd =
+        guest_handle_cast (buf_hnd_void, uint8_t);
+    v4v_ring_t ring;
+    struct v4v_ring_message_header mh = { 0 };
+    int32_t sp;
+    int32_t happy_ret = len;
+    int32_t ret = 0;
+
+    if ( (V4V_ROUNDUP (len) + sizeof (struct v4v_ring_message_header)) >=
+            ring_info->len )
+    {
+        v4v_dprintk("EMSGSIZE\n");
+        return -EMSGSIZE;
+    }
+
+    do
+    {
+        if ( (ret = v4v_memcpy_from_guest_ring (&ring, ring_info,
+                                                0, sizeof (ring))) )
+            break;
+
+        ring.tx_ptr = ring_info->tx_ptr;
+        ring.len = ring_info->len;
+
+        v4v_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d ring_info->tx_ptr=%d\n",
+                    ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
+
+        if ( ring.rx_ptr == ring.tx_ptr )
+            sp = ring_info->len;
+        else
+        {
+            sp = ring.rx_ptr - ring.tx_ptr;
+            if (sp < 0)
+                sp += ring.len;
+        }
+
+        if ( (V4V_ROUNDUP (len) + sizeof (struct v4v_ring_message_header)) >= sp )
+        {
+            v4v_dprintk("EAGAIN\n");
+            ret = -EAGAIN;
+            break;
+        }
+
+        mh.len = len + sizeof (struct v4v_ring_message_header);
+        mh.source = src_id->addr;
+        mh.protocol = proto;
+
+        if ( (ret = v4v_memcpy_to_guest_ring (ring_info,
+                                              ring.tx_ptr + sizeof (v4v_ring_t),
+                                              &mh, sizeof (mh))) )
+            break;
+
+        ring.tx_ptr += sizeof (mh);
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        sp = ring.len - ring.tx_ptr;
+
+        if ( len > sp )
+        {
+            if ((ret = v4v_memcpy_to_guest_ring_from_guest (ring_info,
+                            ring.tx_ptr + sizeof (v4v_ring_t),
+                            buf_hnd, sp)))
+                break;
+
+            ring.tx_ptr = 0;
+            len -= sp;
+            guest_handle_add_offset (buf_hnd, sp);
+        }
+
+        if ( (ret = v4v_memcpy_to_guest_ring_from_guest (ring_info,
+                        ring.tx_ptr + sizeof (v4v_ring_t),
+                        buf_hnd, len)) )
+            break;
+
+        ring.tx_ptr += V4V_ROUNDUP (len);
+
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        mb ();
+        ring_info->tx_ptr = ring.tx_ptr;
+
+        if ( (ret = v4v_update_tx_ptr(ring_info, ring.tx_ptr)) )
+            break;
+
+    }
+    while ( 0 );
+
+    v4v_ring_unmap (ring_info);
+
+    return ret ? ret : happy_ret;
+
+}
+
+static ssize_t
+v4v_iov_count (XEN_GUEST_HANDLE (v4v_iov_t) iovs, int niov)
+{
+    v4v_iov_t iov;
+    size_t ret = 0;
+
+    while ( niov-- )
+    {
+        if ( copy_from_guest (&iov, iovs, 1) )
+            return -EFAULT;
+
+        ret += iov.iov_len;
+        guest_handle_add_offset (iovs, 1);
+    }
+
+    return ret;
+}
+
+/*caller must have L3*/
+static ssize_t
+v4v_ringbuf_insertv (struct domain *d,
+                     struct v4v_ring_info *ring_info,
+                     struct v4v_ring_id *src_id, uint32_t proto,
+                     XEN_GUEST_HANDLE (v4v_iov_t) iovs, uint32_t niov,
+                     uint32_t len)
+{
+    v4v_ring_t ring;
+    struct v4v_ring_message_header mh = { 0 };
+    int32_t sp;
+    int32_t happy_ret;
+    int32_t ret = 0;
+
+    happy_ret = len;
+
+    if ( (V4V_ROUNDUP (len) + sizeof (struct v4v_ring_message_header) ) >=
+            ring_info->len)
+        return -EMSGSIZE;
+
+    do
+    {
+        if ( (ret = v4v_memcpy_from_guest_ring (&ring, ring_info, 0,
+                                               sizeof (ring))) )
+            break;
+
+        ring.tx_ptr = ring_info->tx_ptr;
+        ring.len = ring_info->len;
+
+        v4v_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d ring_info->tx_ptr=%d\n",
+                    ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
+
+        if ( ring.rx_ptr == ring.tx_ptr )
+            sp = ring_info->len;
+        else
+        {
+            sp = ring.rx_ptr - ring.tx_ptr;
+            if (sp < 0)
+                sp += ring.len;
+        }
+
+        if ( (V4V_ROUNDUP (len) + sizeof (struct v4v_ring_message_header) ) >= sp)
+        {
+            v4v_dprintk("EAGAIN\n");
+            ret = -EAGAIN;
+            break;
+        }
+
+        mh.len = len + sizeof (struct v4v_ring_message_header);
+        mh.source = src_id->addr;
+        mh.protocol = proto;
+
+        if ( (ret = v4v_memcpy_to_guest_ring (ring_info,
+                                              ring.tx_ptr + sizeof (v4v_ring_t),
+                                              &mh, sizeof (mh))) )
+            break;
+
+        ring.tx_ptr += sizeof (mh);
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        while ( niov-- )
+        {
+            XEN_GUEST_HANDLE (uint8_t) buf_hnd;
+            v4v_iov_t iov;
+
+            if ( copy_from_guest (&iov, iovs, 1) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            buf_hnd.p = (uint8_t *) iov.iov_base; //FIXME
+            len = iov.iov_len;
+
+            if ( unlikely (!guest_handle_okay (buf_hnd, len)) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            sp = ring.len - ring.tx_ptr;
+
+            if ( len > sp )
+            {
+                if ( (ret = v4v_memcpy_to_guest_ring_from_guest (ring_info,
+                                ring.tx_ptr +
+                                sizeof (v4v_ring_t),
+                                buf_hnd, sp)) )
+                    break;
+
+                ring.tx_ptr = 0;
+                len -= sp;
+                guest_handle_add_offset (buf_hnd, sp);
+            }
+
+            if ( (ret = v4v_memcpy_to_guest_ring_from_guest (ring_info,
+                            ring.tx_ptr +
+                            sizeof (v4v_ring_t),
+                            buf_hnd, len)) )
+                break;
+
+            ring.tx_ptr += len;
+
+            if (ring.tx_ptr == ring_info->len)
+                ring.tx_ptr = 0;
+
+            guest_handle_add_offset (iovs, 1);
+        }
+        if ( ret )
+            break;
+
+        ring.tx_ptr = V4V_ROUNDUP (ring.tx_ptr);
+
+        if ( ring.tx_ptr >= ring_info->len )
+            ring.tx_ptr -= ring_info->len;
+
+        mb ();
+        ring_info->tx_ptr = ring.tx_ptr;
+        if ( (ret = v4v_update_tx_ptr(ring_info, ring.tx_ptr)) )
+            break;
+    }
+    while ( 0 );
+
+    v4v_ring_unmap (ring_info);
+
+    return ret ? ret : happy_ret;
+}
+
+
+
+/* pending */
+static void
+v4v_pending_remove_ent (struct v4v_pending_ent *ent)
+{
+    hlist_del (&ent->node);
+    xfree (ent);
+}
+
+/*caller must have L3 */
+static void
+v4v_pending_remove_all (struct v4v_ring_info *info)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    hlist_for_each_entry_safe (pending_ent, node, next, &info->pending,
+            node) v4v_pending_remove_ent (pending_ent);
+}
+
+/*Caller must hold L1 */
+static void
+v4v_pending_notify (struct domain *caller_d, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *pending_ent;
+
+    hlist_for_each_entry_safe (pending_ent, node, next, to_notify, node)
+    {
+        hlist_del (&pending_ent->node);
+        v4v_signal_domid (pending_ent->id);
+        xfree (pending_ent);
+    }
+
+}
+
+/*caller must have R(L2) */
+static void
+v4v_pending_find (struct v4v_ring_info *ring_info, uint32_t payload_space,
+        struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    spin_lock (&ring_info->lock);
+    hlist_for_each_entry_safe (ent, node, next, &ring_info->pending, node)
+    {
+        if (payload_space >= ent->len)
+        {
+            hlist_del (&ent->node);
+            hlist_add_head (&ent->node, to_notify);
+        }
+    }
+    spin_unlock (&ring_info->lock);
+}
+
+/*caller must have L3 */
+static int
+v4v_pending_queue (struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct v4v_pending_ent *ent = xmalloc (struct v4v_pending_ent);
+
+    if ( !ent )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    ent->len = len;
+    ent->id = src_id;
+
+    hlist_add_head (&ent->node, &ring_info->pending);
+
+    return 0;
+}
+
+/* L3 */
+static int
+v4v_pending_requeue (struct v4v_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct hlist_node *node;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry (ent, node, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id )
+        {
+            if ( ent->len < len )
+                ent->len = len;
+            return 0;
+        }
+    }
+
+    return v4v_pending_queue (ring_info, src_id, len);
+}
+
+
+/* L3 */
+static void
+v4v_pending_cancel (struct v4v_ring_info *ring_info, domid_t src_id)
+{
+    struct hlist_node *node, *next;
+    struct v4v_pending_ent *ent;
+
+    hlist_for_each_entry_safe (ent, node, next, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id)
+        {
+            hlist_del (&ent->node);
+            xfree (ent);
+        }
+    }
+}
+
+/*
+ * ring data
+ */
+
+/*Caller should hold R(L1)*/
+static int
+v4v_fill_ring_data (struct domain *src_d,
+                    XEN_GUEST_HANDLE (v4v_ring_data_ent_t) data_ent_hnd)
+{
+    v4v_ring_data_ent_t ent;
+    struct domain *dst_d;
+    struct v4v_ring_info *ring_info;
+
+    if ( copy_from_guest (&ent, data_ent_hnd, 1) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+
+    v4v_dprintk("v4v_fill_ring_data: ent.ring.domain=%d,ent.ring.port=%u\n",
+                (int) ent.ring.domain, (int) ent.ring.port);
+
+    ent.flags = 0;
+
+    dst_d = get_domain_by_id (ent.ring.domain);
+
+    if ( dst_d && dst_d->v4v )
+    {
+        read_lock (&dst_d->v4v->lock);
+        ring_info = v4v_ring_find_info_by_addr (dst_d, &ent.ring,
+                                                src_d->domain_id);
+
+        if ( ring_info )
+        {
+            uint32_t space_avail;
+
+            ent.flags |= V4V_RING_DATA_F_EXISTS;
+            ent.max_message_size =
+                ring_info->len - sizeof (struct v4v_ring_message_header) -
+                V4V_ROUNDUP (1);
+            spin_lock (&ring_info->lock);
+
+            space_avail = v4v_ringbuf_payload_space (dst_d, ring_info);
+
+            if ( space_avail >= ent.space_required )
+            {
+                v4v_pending_cancel (ring_info, src_d->domain_id);
+                ent.flags |= V4V_RING_DATA_F_SUFFICIENT;
+            }
+            else
+            {
+                v4v_pending_requeue (ring_info, src_d->domain_id,
+                        ent.space_required);
+                ent.flags |= V4V_RING_DATA_F_PENDING;
+            }
+
+            spin_unlock (&ring_info->lock);
+
+            if ( space_avail == ent.max_message_size )
+                ent.flags |= V4V_RING_DATA_F_EMPTY;
+
+        }
+        read_unlock (&dst_d->v4v->lock);
+    }
+
+    if ( dst_d )
+        put_domain (dst_d);
+
+    if ( copy_field_to_guest (data_ent_hnd, &ent, flags) )
+    {
+        v4v_dprintk("EFAULT\n");
+        return -EFAULT;
+    }
+    return 0;
+}
+
+/*Called should hold no more than R(L1) */
+static int
+v4v_fill_ring_datas (struct domain *d, int nent,
+                     XEN_GUEST_HANDLE (v4v_ring_data_ent_t) data_ent_hnd)
+{
+    int ret = 0;
+
+    read_lock (&v4v_lock);
+    while ( !ret && nent-- )
+    {
+        ret = v4v_fill_ring_data (d, data_ent_hnd);
+        guest_handle_add_offset (data_ent_hnd, 1);
+    }
+    read_unlock (&v4v_lock);
+    return ret;
+}
+
+/*
+ * ring
+ */
+static int
+v4v_find_ring_mfns (struct domain *d, struct v4v_ring_info *ring_info,
+                    uint32_t npage, XEN_GUEST_HANDLE (v4v_pfn_t) pfn_hnd)
+{
+    int i,j;
+    mfn_t *mfns;
+    uint8_t **mfn_mapping;
+    unsigned long mfn;
+    struct page_info *page;
+    int ret = 0;
+
+    if ((npage << PAGE_SHIFT) < ring_info->len)
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    mfns = xmalloc_array (mfn_t, npage);
+    if ( !mfns )
+    {
+        v4v_dprintk("ENOMEM\n");
+        return -ENOMEM;
+    }
+
+    mfn_mapping = xmalloc_array (uint8_t *, npage);
+    if ( !mfn_mapping )
+    {
+        xfree (mfns);
+        return -ENOMEM;
+    }
+
+    for (i = 0; i < npage; ++i)
+    {
+        unsigned long pfn;
+        p2m_type_t p2mt;
+
+        if ( copy_from_guest_offset (&pfn, pfn_hnd, i, 1) )
+        {
+            ret = -EFAULT;
+            v4v_dprintk("EFAULT\n");
+            break;
+        }
+
+        mfn = mfn_x(get_gfn(d, pfn, &p2mt));
+        if ( !mfn_valid(mfn) )
+        {
+            printk(KERN_ERR "v4v domain %d passed invalid mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        page = mfn_to_page(mfn);
+        if ( !get_page_and_type(page, d, PGT_writable_page) )
+        {
+            printk(KERN_ERR "v4v domain %d passed wrong type mfn %"PRI_mfn" ring %p seq %d\n",
+                    d->domain_id, mfn, ring_info, i);
+            ret = -EINVAL;
+            break;
+        }
+        mfns[i] = _mfn(mfn);
+        v4v_dprintk("v4v_find_ring_mfns: %d: %lx -> %lx\n",
+                    i, (unsigned long) pfn, (unsigned long) mfn_x (mfns[i]));
+        mfn_mapping[i] = NULL;
+        put_gfn(d, pfn);
+    }
+
+    if ( !ret )
+    {
+        ring_info->npage = npage;
+        ring_info->mfns = mfns;
+        ring_info->mfn_mapping = mfn_mapping;
+    }
+    else
+    {
+        j = i;
+        for ( i = 0; i < j; ++i )
+            if ( mfn_x(mfns[i]) != 0 )
+                put_page_and_type(mfn_to_page(mfn_x(mfns[i])));
+        xfree (mfn_mapping);
+        xfree (mfns);
+        v4v_dprintk("");
+    }
+    return ret;
+}
+
+
+/* caller must hold R(L2) */
+static struct v4v_ring_info *
+v4v_ring_find_info (struct domain *d, struct v4v_ring_id *id)
+{
+    uint16_t hash;
+    struct hlist_node *node;
+    struct v4v_ring_info *ring_info;
+
+    hash = v4v_hash_fn (id);
+
+    v4v_dprintk("ring_find_info: d->v4v=%p, d->v4v->ring_hash[%d]=%p id=%p\n",
+                d->v4v, (int) hash, d->v4v->ring_hash[hash].first, id);
+    v4v_dprintk("ring_find_info: id.addr.port=%d id.addr.domain=%d id.addr.partner=%d\n",
+                id->addr.port, id->addr.domain, id->partner);
+
+    hlist_for_each_entry (ring_info, node, &d->v4v->ring_hash[hash], node)
+    {
+        if ( !memcmp (id, &ring_info->id, sizeof (*id)) )
+        {
+            v4v_dprintk("ring_find_info: ring_info=%p\n", ring_info);
+            return ring_info;
+        }
+    }
+    v4v_dprintk("ring_find_info: no ring_info found\n");
+    return NULL;
+}
+
+/* caller must hold R(L2) */
+static struct v4v_ring_info *
+v4v_ring_find_info_by_addr (struct domain *d, struct v4v_addr *a, domid_t p)
+{
+    struct v4v_ring_id id;
+    struct v4v_ring_info *ret;
+
+    if ( !a )
+        return NULL;
+
+    id.addr.port = a->port;
+    id.addr.domain = d->domain_id;
+    id.partner = p;
+
+    ret = v4v_ring_find_info (d, &id);
+    if ( ret )
+        return ret;
+
+    id.partner = V4V_DOMID_NONE;
+
+    return v4v_ring_find_info (d, &id);
+}
+
+/*caller must hold W(L2) */
+static void v4v_ring_remove_mfns (struct v4v_ring_info *ring_info)
+{
+    int i;
+
+    if ( ring_info->mfns )
+    {
+        for ( i=0; i < ring_info->npage; ++i )
+            if (mfn_x(ring_info->mfns[i]) != 0)
+                put_page_and_type(mfn_to_page(mfn_x(ring_info->mfns[i])));
+        xfree (ring_info->mfns);
+    }
+    ring_info->mfns = NULL;
+}
+
+/*caller must hold W(L2) */
+static void
+v4v_ring_remove_info (struct v4v_ring_info *ring_info)
+{
+    v4v_pending_remove_all (ring_info);
+
+    hlist_del (&ring_info->node);
+    v4v_ring_remove_mfns(ring_info);
+    xfree (ring_info);
+}
+
+/* Call from guest to unpublish a ring */
+static long
+v4v_ring_remove (struct domain *d, XEN_GUEST_HANDLE (v4v_ring_t) ring_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    read_lock (&v4v_lock);
+
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest (&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                    ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+
+        write_lock (&d->v4v->lock);
+        ring_info = v4v_ring_find_info (d, &ring.id);
+
+        if ( ring_info )
+            v4v_ring_remove_info (ring_info);
+
+        write_unlock (&d->v4v->lock);
+
+        if ( !ring_info )
+        {
+            v4v_dprintk( "ENOENT\n" );
+            ret = -ENOENT;
+            break;
+        }
+
+    }
+    while ( 0 );
+
+    read_unlock (&v4v_lock);
+    return ret;
+}
+
+/* call from guest to publish a ring */
+static long
+v4v_ring_add (struct domain *d, XEN_GUEST_HANDLE (v4v_ring_t) ring_hnd,
+              uint32_t npage, XEN_GUEST_HANDLE (v4v_pfn_t) pfn_hnd)
+{
+    struct v4v_ring ring;
+    struct v4v_ring_info *ring_info;
+    int need_to_insert = 0;
+    int ret = 0;
+
+    if ( (long) ring_hnd.p & (PAGE_SIZE - 1) )
+    {
+        v4v_dprintk("EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock (&v4v_lock);
+    do
+    {
+        if ( !d->v4v )
+        {
+            v4v_dprintk(" !d->v4v, EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( copy_from_guest (&ring, ring_hnd, 1) )
+        {
+            v4v_dprintk(" copy_from_guest failed, EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != V4V_RING_MAGIC )
+        {
+            v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring.magic, V4V_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( (ring.len <
+                    (sizeof (struct v4v_ring_message_header) + V4V_ROUNDUP (1) +
+                     V4V_ROUNDUP (1))) || (V4V_ROUNDUP (ring.len) != ring.len) )
+        {
+            v4v_dprintk("EINVAL\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain = d->domain_id;
+        if ( copy_field_to_guest (ring_hnd, &ring, id) )
+        {
+            v4v_dprintk("EFAULT\n");
+            ret = -EFAULT;
+            break;
+        }
+
+        /*
+         * no need for a lock yet, because only we know about this
+         * set the tx pointer if it looks bogus (we don't reset it
+         * because this might be a re-register after S4)
+         */
+        if ( (ring.tx_ptr >= ring.len)
+                || (V4V_ROUNDUP (ring.tx_ptr) != ring.tx_ptr) )
+        {
+            ring.tx_ptr = ring.rx_ptr;
+        }
+        copy_field_to_guest (ring_hnd, &ring, tx_ptr);
+
+        read_lock (&d->v4v->lock);
+        ring_info = v4v_ring_find_info (d, &ring.id);
+
+        if ( !ring_info )
+        {
+            read_unlock (&d->v4v->lock);
+            ring_info = xmalloc (struct v4v_ring_info);
+            if ( !ring_info )
+            {
+                v4v_dprintk("ENOMEM\n");
+                ret = -ENOMEM;
+                break;
+            }
+            need_to_insert++;
+            spin_lock_init (&ring_info->lock);
+            INIT_HLIST_HEAD (&ring_info->pending);
+            ring_info->mfns = NULL;
+
+        }
+        else
+        {
+            /*
+             * Ring info already existed. If mfn list was already
+             * populated remove the MFN's from list and then add
+             * the new list.
+             */
+            printk(KERN_INFO "v4v: dom%d re-registering existing ring, clearing MFN list\n",
+                    current->domain->domain_id);
+            v4v_ring_remove_mfns(ring_info);
+        }
+
+        spin_lock (&ring_info->lock);
+        ring_info->id = ring.id;
+        ring_info->len = ring.len;
+        ring_info->tx_ptr = ring.tx_ptr;
+        ring_info->ring = ring_hnd;
+        if ( ring_info->mfns )
+            xfree (ring_info->mfns);
+        ret = v4v_find_ring_mfns (d, ring_info, npage, pfn_hnd);
+        spin_unlock (&ring_info->lock);
+        if ( ret )
+            break;
+
+        if ( !need_to_insert )
+        {
+            read_unlock (&d->v4v->lock);
+        }
+        else
+        {
+            uint16_t hash = v4v_hash_fn (&ring.id);
+            write_lock (&d->v4v->lock);
+            hlist_add_head (&ring_info->node, &d->v4v->ring_hash[hash]);
+            write_unlock (&d->v4v->lock);
+        }
+    }
+    while ( 0 );
+
+    read_unlock (&v4v_lock);
+    return ret;
+}
+
+
+/*
+ * io
+ */
+
+/*Caller must hold v4v_lock and hash_lock*/
+static void
+v4v_notify_ring (struct domain *d, struct v4v_ring_info *ring_info,
+        struct hlist_head *to_notify)
+{
+    uint32_t space;
+
+    spin_lock (&ring_info->lock);
+    space = v4v_ringbuf_payload_space (d, ring_info);
+    spin_unlock (&ring_info->lock);
+
+    v4v_pending_find (ring_info, space, to_notify);
+}
+
+/*notify hypercall*/
+static long
+v4v_notify (struct domain *d,
+            XEN_GUEST_HANDLE (v4v_ring_data_t) ring_data_hnd)
+{
+    v4v_ring_data_t ring_data;
+    HLIST_HEAD (to_notify);
+    int i;
+    int ret = 0;
+
+    read_lock (&v4v_lock);
+
+    if ( !d->v4v )
+    {
+        read_unlock (&v4v_lock);
+        v4v_dprintk("!d->v4v, ENODEV\n");
+        return -ENODEV;
+    }
+
+    read_lock (&d->v4v->lock);
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node, *next;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry_safe (ring_info, node,
+                next, &d->v4v->ring_hash[i],
+                node)
+        {
+            v4v_notify_ring (d, ring_info, &to_notify);
+        }
+    }
+    read_unlock (&d->v4v->lock);
+
+    if ( !hlist_empty (&to_notify) )
+    {
+        v4v_pending_notify (d, &to_notify);
+    }
+
+    do
+    {
+        if ( !guest_handle_is_null (ring_data_hnd) )
+        {
+            /* Quick sanity check on ring_data_hnd */
+            if ( copy_field_from_guest (&ring_data, ring_data_hnd, magic) )
+            {
+                v4v_dprintk("copy_field_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            if ( ring_data.magic != V4V_RING_DATA_MAGIC )
+            {
+                v4v_dprintk("ring.magic(%lx) != V4V_RING_MAGIC(%lx), EINVAL\n",
+                        ring_data.magic, V4V_RING_MAGIC);
+                ret = -EINVAL;
+                break;
+            }
+
+            if (copy_from_guest (&ring_data, ring_data_hnd, 1))
+            {
+                v4v_dprintk("copy_from_guest failed\n");
+                ret = -EFAULT;
+                break;
+            }
+
+            {
+                XEN_GUEST_HANDLE (v4v_ring_data_ent_t) ring_data_ent_hnd;
+                ring_data_ent_hnd =
+                    guest_handle_for_field(ring_data_hnd, v4v_ring_data_ent_t, ring[0]);
+                ret = v4v_fill_ring_datas (d, ring_data.nent, ring_data_ent_hnd);
+            }
+        }
+    }
+    while ( 0 );
+
+    read_unlock (&v4v_lock);
+
+    return ret;
+}
+
+
+
+/*Hypercall to do the send*/
+static size_t
+v4v_send (struct domain *src_d, v4v_addr_t * src_addr,
+          v4v_addr_t * dst_addr, uint32_t proto,
+          XEN_GUEST_HANDLE (void) buf, size_t len)
+{
+    struct domain *dst_d;
+    struct v4v_ring_id src_id;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    if ( !dst_addr )
+    {
+        v4v_dprintk("!dst_addr\n");
+        return -EINVAL;
+    }
+
+    read_lock (&v4v_lock);
+    if ( !src_d->v4v )
+    {
+        read_unlock (&v4v_lock);
+        v4v_dprintk("!src_d->v4v\n");
+        return -EINVAL;
+    }
+
+    src_id.addr.port = src_addr->port;
+    src_id.addr.domain = src_d->domain_id;
+    src_id.partner = dst_addr->domain;
+
+    dst_d = get_domain_by_id (dst_addr->domain);
+    if ( !dst_d )
+    {
+        read_unlock (&v4v_lock);
+        v4v_dprintk("!dst_d, ECONNREFUSED\n");
+        return -ECONNREFUSED;
+    }
+
+    do
+    {
+        if ( !dst_d->v4v )
+        {
+            ret = -ECONNREFUSED;
+            v4v_dprintk("!dst_d->v4v, ECONNREFUSED\n");
+            break;
+        }
+
+        read_lock (&dst_d->v4v->lock);
+        ring_info =
+            v4v_ring_find_info_by_addr (dst_d, dst_addr, src_addr->domain);
+
+        if ( !ring_info )
+        {
+            ret = -ECONNREFUSED;
+            v4v_dprintk("!ring_info\n");
+        }
+        else
+        {
+            spin_lock (&ring_info->lock);
+            ret =
+                v4v_ringbuf_insert (dst_d, ring_info, &src_id, proto, buf, len);
+            if ( ret == -EAGAIN )
+            {
+                v4v_dprintk("ret == EAGAIN\n");
+                /* Schedule a wake up on the event channel when space is there */
+                if (v4v_pending_requeue (ring_info, src_d->domain_id, len))
+                {
+                    v4v_dprintk("v4v_pending_requeue failed, ENOMEM\n");
+                    ret = -ENOMEM;
+                }
+            }
+            spin_unlock (&ring_info->lock);
+
+            if (ret >= 0)
+            {
+                v4v_signal_domain (dst_d);
+            }
+        }
+        read_unlock (&dst_d->v4v->lock);
+    }
+    while ( 0 );
+
+    put_domain (dst_d);
+    read_unlock (&v4v_lock);
+    return ret;
+}
+
+/*Hypercall to do the send*/
+static size_t
+v4v_sendv (struct domain *src_d, v4v_addr_t * src_addr,
+           v4v_addr_t * dst_addr, uint32_t proto,
+           XEN_GUEST_HANDLE (v4v_iov_t) iovs, size_t niov)
+{
+    struct domain *dst_d;
+    struct v4v_ring_id src_id;
+    struct v4v_ring_info *ring_info;
+    int ret = 0;
+
+    if ( !dst_addr )
+    {
+        v4v_dprintk("!dst_addr, EINVAL\n");
+        return -EINVAL;
+    }
+
+    read_lock (&v4v_lock);
+    if (!src_d->v4v)
+    {
+        read_unlock (&v4v_lock);
+        v4v_dprintk("!src_d->v4v, EINVAL\n");
+        return -EINVAL;
+    }
+
+    src_id.addr.port = src_addr->port;
+    src_id.addr.domain = src_d->domain_id;
+    src_id.partner = dst_addr->domain;
+
+    dst_d = get_domain_by_id (dst_addr->domain);
+    if (!dst_d)
+    {
+        read_unlock (&v4v_lock);
+        v4v_dprintk("!dst_d, ECONNREFUSED\n");
+        return -ECONNREFUSED;
+    }
+
+    do
+    {
+        if ( !dst_d->v4v )
+        {
+            v4v_dprintk("dst_d->v4v, ECONNREFUSED\n");
+            ret = -ECONNREFUSED;
+            break;
+        }
+
+        read_lock (&dst_d->v4v->lock);
+        ring_info =
+            v4v_ring_find_info_by_addr (dst_d, dst_addr, src_addr->domain);
+
+        if ( !ring_info )
+        {
+            ret = -ECONNREFUSED;
+            v4v_dprintk(" !ring_info, ECONNREFUSED\n");
+        }
+        else
+        {
+            uint32_t len = v4v_iov_count (iovs, niov);
+
+            if ( len < 0 )
+            {
+                ret = len;
+                break;
+            }
+
+            spin_lock (&ring_info->lock);
+            ret =
+                v4v_ringbuf_insertv (dst_d, ring_info, &src_id, proto, iovs,
+                        niov, len);
+            if ( ret == -EAGAIN )
+            {
+                v4v_dprintk("v4v_ringbuf_insertv failed, EAGAIN\n");
+                /* Schedule a wake up on the event channel when space is there */
+                if (v4v_pending_requeue (ring_info, src_d->domain_id, len))
+                {
+                    v4v_dprintk("v4v_pending_requeue failed, ENOMEM\n");
+                    ret = -ENOMEM;
+                }
+            }
+            spin_unlock (&ring_info->lock);
+
+            if ( ret >= 0 )
+            {
+                v4v_signal_domain (dst_d);
+            }
+
+        }
+        read_unlock (&dst_d->v4v->lock);
+
+    }
+    while ( 0 );
+
+    put_domain (dst_d);
+    read_unlock (&v4v_lock);
+    return ret;
+}
+
+/*
+ * hypercall glue
+ */
+long
+do_v4v_op (int cmd, XEN_GUEST_HANDLE (void) arg1,
+           XEN_GUEST_HANDLE (void) arg2,
+           XEN_GUEST_HANDLE (void) arg3, uint32_t arg4, uint32_t arg5)
+{
+    struct domain *d = current->domain;
+    long rc = -EFAULT;
+
+    v4v_dprintk("->do_v4v_op(%d,%p,%p,%p,%d,%d)\n", cmd,
+                (void *) arg1.p, (void *) arg2.p, (void *) arg3.p,
+                (int) arg4, (int) arg5);
+
+    domain_lock (d);
+    switch (cmd)
+    {
+        case V4VOP_register_ring:
+            {
+                XEN_GUEST_HANDLE (v4v_ring_t) ring_hnd =
+                    guest_handle_cast (arg1, v4v_ring_t);
+                XEN_GUEST_HANDLE (v4v_pfn_t) pfn_hnd =
+                    guest_handle_cast (arg2, v4v_pfn_t);
+                uint32_t npage = arg4;
+                if ( unlikely (!guest_handle_okay (ring_hnd, 1)) )
+                    goto out;
+                if ( unlikely (!guest_handle_okay (pfn_hnd, npage)) )
+                    goto out;
+                rc = v4v_ring_add (d, ring_hnd, npage, pfn_hnd);
+                break;
+            }
+        case V4VOP_unregister_ring:
+            {
+                XEN_GUEST_HANDLE (v4v_ring_t) ring_hnd =
+                    guest_handle_cast (arg1, v4v_ring_t);
+                if ( unlikely (!guest_handle_okay (ring_hnd, 1)) )
+                    goto out;
+                rc = v4v_ring_remove (d, ring_hnd);
+                break;
+            }
+        case V4VOP_send:
+            {
+                v4v_addr_t src, dst;
+                uint32_t len = arg4;
+                uint32_t protocol = arg5;
+                XEN_GUEST_HANDLE (v4v_addr_t) src_hnd =
+                    guest_handle_cast (arg1, v4v_addr_t);
+                XEN_GUEST_HANDLE (v4v_addr_t) dst_hnd =
+                    guest_handle_cast (arg2, v4v_addr_t);
+
+                if ( unlikely (!guest_handle_okay (src_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest (&src, src_hnd, 1) )
+                    goto out;
+
+                if ( unlikely (!guest_handle_okay (dst_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest (&dst, dst_hnd, 1) )
+                    goto out;
+
+                rc = v4v_send (d, &src, &dst, protocol, arg3, len);
+                break;
+            }
+        case V4VOP_sendv:
+            {
+                v4v_addr_t src, dst;
+                uint32_t niov = arg4;
+                uint32_t protocol = arg5;
+                XEN_GUEST_HANDLE (v4v_addr_t) src_hnd =
+                    guest_handle_cast (arg1, v4v_addr_t);
+                XEN_GUEST_HANDLE (v4v_addr_t) dst_hnd =
+                    guest_handle_cast (arg2, v4v_addr_t);
+                XEN_GUEST_HANDLE (v4v_iov_t) iovs =
+                    guest_handle_cast (arg3, v4v_iov_t);
+
+                if ( unlikely (!guest_handle_okay (src_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest (&src, src_hnd, 1) )
+                    goto out;
+
+                if ( unlikely (!guest_handle_okay (dst_hnd, 1)) )
+                    goto out;
+                if ( copy_from_guest (&dst, dst_hnd, 1) )
+                    goto out;
+
+                if ( unlikely (!guest_handle_okay (iovs, niov)) )
+                    goto out;
+
+                rc = v4v_sendv (d, &src, &dst, protocol, iovs, niov);
+                break;
+            }
+        case V4VOP_notify:
+            {
+                XEN_GUEST_HANDLE (v4v_ring_data_t) ring_data_hnd =
+                    guest_handle_cast (arg1, v4v_ring_data_t);
+                rc = v4v_notify (d, ring_data_hnd);
+                break;
+            }
+        default:
+            rc = -ENOSYS;
+            break;
+    }
+out:
+    domain_unlock (d);
+    v4v_dprintk("<-do_v4v_op()=%d\n", (int) rc);
+    return rc;
+}
+
+/*
+ * init
+ */
+
+void
+v4v_destroy (struct domain *d)
+{
+    int i;
+
+    BUG_ON (!d->is_dying);
+    write_lock (&v4v_lock);
+
+    v4v_dprintk("d->v=%p\n", d->v4v);
+
+    if ( d->v4v )
+    {
+        for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+        {
+            struct hlist_node *node, *next;
+            struct v4v_ring_info *ring_info;
+
+            hlist_for_each_entry_safe (ring_info, node,
+                    next, &d->v4v->ring_hash[i],
+                    node)
+            {
+                v4v_ring_remove_info (ring_info);
+            }
+        }
+    }
+
+    d->v4v = NULL;
+    write_unlock (&v4v_lock);
+}
+
+int
+v4v_init (struct domain *d)
+{
+    struct v4v_domain *v4v;
+    int i;
+
+    v4v = xmalloc (struct v4v_domain);
+    if ( !v4v )
+        return -ENOMEM;
+
+    rwlock_init (&v4v->lock);
+
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        INIT_HLIST_HEAD (&v4v->ring_hash[i]);
+    }
+
+    write_lock (&v4v_lock);
+    d->v4v = v4v;
+    write_unlock (&v4v_lock);
+
+    return 0;
+}
+
+
+/*
+ * debug
+ */
+
+static void
+dump_domain_ring (struct domain *d, struct v4v_ring_info *ring_info)
+{
+    uint32_t rx_ptr;
+
+    printk (KERN_ERR "  ring: domid=%d port=0x%08x partner=%d npage=%d\n",
+            (int) d->domain_id, (int) ring_info->id.addr.port,
+            (int) ring_info->id.partner, (int) ring_info->npage);
+
+    if ( v4v_ringbuf_get_rx_ptr (d, ring_info, &rx_ptr) )
+    {
+        printk (KERN_ERR "   Failed to read rx_ptr\n");
+        return;
+    }
+
+    printk (KERN_ERR "   tx_ptr=%d rx_ptr=%d len=%d\n",
+            (int) ring_info->tx_ptr, (int) rx_ptr, (int) ring_info->len);
+}
+
+static void
+dump_domain_rings (struct domain *d)
+{
+    int i;
+
+    printk (KERN_ERR " domain %d:\n", (int) d->domain_id);
+
+    read_lock (&d->v4v->lock);
+
+    for ( i = 0; i < V4V_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node;
+        struct v4v_ring_info *ring_info;
+
+        hlist_for_each_entry (ring_info, node, &d->v4v->ring_hash[i], node)
+            dump_domain_ring (d, ring_info);
+    }
+    read_unlock (&d->v4v->lock);
+
+    printk (KERN_ERR "\n");
+    v4v_signal_domain (d);
+}
+
+static void
+dump_rings (unsigned char key)
+{
+    struct domain *d;
+
+    printk (KERN_ERR "\n\nV4V ring dump:\n");
+    read_lock (&v4v_lock);
+
+    rcu_read_lock (&domlist_read_lock);
+
+    for_each_domain (d) dump_domain_rings (d);
+
+    rcu_read_unlock (&domlist_read_lock);
+
+    read_unlock (&v4v_lock);
+}
+
+struct keyhandler v4v_info_keyhandler = {
+    .diagnostic = 1,
+    .u.fn = dump_rings,
+    .desc = "dump v4v ring states and intterupt"
+};
+
+static int __init
+setup_dump_rings (void)
+{
+    register_keyhandler ('4', &v4v_info_keyhandler);
+    return 0;
+}
+
+__initcall (setup_dump_rings);
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/v4v.h b/xen/include/public/v4v.h
new file mode 100644
index 0000000..197770e
--- /dev/null
+++ b/xen/include/public/v4v.h
@@ -0,0 +1,240 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __XEN_PUBLIC_V4V_H__
+#define __XEN_PUBLIC_V4V_H__
+
+#include "xen.h"
+
+/*
+ * Structure definitions
+ */
+
+#define V4V_PROTO_DGRAM		0x3c2c1db8
+#define V4V_PROTO_STREAM 	0x70f6a8e5
+
+#ifdef __i386__
+# define V4V_RING_MAGIC         0xdf6977f231abd910ULL
+# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302dULL
+#else
+# define V4V_RING_MAGIC         0xdf6977f231abd910
+# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302d
+#endif
+#define V4V_DOMID_INVALID       (0x7FFFU)
+#define V4V_DOMID_NONE          V4V_DOMID_INVALID
+#define V4V_DOMID_ANY           V4V_DOMID_INVALID
+#define V4V_PORT_NONE           0
+
+/*
+ * struct v4v_iov
+ * {
+ *      64 bits: iov_base
+ *      64 bits: iov_len
+ * }
+ */
+
+/*
+ * struct v4v_addr
+ * {
+ *      32 bits: port
+ *      16 bits: domid
+ * }
+ */
+
+/*
+ * v4v_ring_id
+ * {
+ *      struct v4v_addr: addr
+ *      16 bits: partner
+ * }
+ */
+
+/*
+ * v4v_ring
+ * {
+ *      64 bits: magic
+ *      v4v_rind_id: id
+ *      32 bits: len
+ *      32 bits: rx_ptr
+ *      32 bits: tx_ptr
+ *      64 bits: padding
+ *      ... : ring
+ * }
+ *
+ * id:
+ * xen only looks at this during register/unregister
+ * and will fill in id.addr.domain
+ *
+ * rx_ptr: rx pointer, modified by domain
+ * tx_ptr: tx pointer, modified by xen
+ */
+
+#ifdef __i386__
+#define V4V_RING_DATA_MAGIC	0x4ce4d30fbc82e92aULL
+#else
+#define V4V_RING_DATA_MAGIC	0x4ce4d30fbc82e92a
+#endif
+
+#define V4V_RING_DATA_F_EMPTY       1U << 0 /* Ring is empty */
+#define V4V_RING_DATA_F_EXISTS      1U << 1 /* Ring exists */
+#define V4V_RING_DATA_F_PENDING     1U << 2 /* Pending interrupt exists - do not
+                                               rely on this field - for
+                                               profiling only */
+#define V4V_RING_DATA_F_SUFFICIENT  1U << 3 /* Sufficient space to queue
+                                               space_required bytes exists */
+
+/*
+ * v4v_ring_data_ent
+ * {
+ *      v4v_addr: ring
+ *      16 bits: flags
+ *      16 bits: padding
+ *      32 bits: space_required
+ *      32 bits: max_message_size
+ * }
+ */
+
+/*
+ * v4v_ring_data
+ * {
+ *      64 bits: magic (V4V_RING_DATA_MAGIC)
+ *      32 bits: nent
+ *      32 bits: padding
+ *      256 bits: reserved
+ *      ... : v4v_ring_data_ent
+ * }
+ */
+
+
+#define V4V_ROUNDUP(a) (((a) +0xf ) & ~0xf)
+/*
+ * Messages on the ring are padded to 128 bits
+ * Len here refers to the exact length of the data not including the
+ * 128 bit header. The message uses
+ * ((len +0xf) & ~0xf) + sizeof(v4v_ring_message_header) bytes
+ */
+
+/*
+ * v4v_stream_header
+ * {
+ *      32 bits: flags
+ *      32 bits: conid
+ * }
+ */
+
+/*
+ * v4v_ring_message_header
+ * {
+ *      32 bits: len
+ *      v4v_addr: source
+ *      32 bits: protocol
+ *      ... : data
+ * }
+ */
+
+/*
+ * HYPERCALLS
+ */
+
+#define V4VOP_register_ring 	1
+/*
+ * Registers a ring with Xen, if a ring with the same v4v_ring_id exists,
+ * this ring takes its place, registration will not change tx_ptr
+ * unless it is invalid
+ *
+ * do_v4v_op(V4VOP_unregister_ring,
+ *           v4v_ring, XEN_GUEST_HANDLE(v4v_pfn),
+ *           NULL, npage, 0)
+ */
+
+
+#define V4VOP_unregister_ring 	2
+/*
+ * Unregister a ring.
+ *
+ * v4v_hypercall(V4VOP_send, v4v_ring, NULL, NULL, 0, 0)
+ */
+
+#define V4VOP_send 		3
+/*
+ * Sends len bytes of buf to dst, giving src as the source address (xen will
+ * ignore src->domain and put your domain in the actually message), xen
+ * first looks for a ring with id.addr==dst and id.partner==sending_domain
+ * if that fails it looks for id.addr==dst and id.partner==DOMID_ANY.
+ * protocol is the 32 bit protocol number used from the message
+ * most likely V4V_PROTO_DGRAM or STREAM. If insufficient space exists
+ * it will return -EAGAIN and xen will twing the V4V_INTERRUPT when
+ * sufficient space becomes available
+ *
+ * v4v_hypercall(V4VOP_send,
+ *               v4v_addr src,
+ *               v4v_addr dst,
+ *               void* buf,
+ *               uint32_t len,
+ *               uint32_t protocol)
+ */
+
+
+#define V4VOP_notify 		4
+/* Asks xen for information about other rings in the system
+ *
+ * ent->ring is the v4v_addr_t of the ring you want information on
+ * the same matching rules are used as for V4VOP_send.
+ *
+ * ent->space_required  if this field is not null xen will check
+ * that there is space in the destination ring for this many bytes
+ * of payload. If there is it will set the V4V_RING_DATA_F_SUFFICIENT
+ * and CANCEL any pending interrupt for that ent->ring, if insufficient
+ * space is available it will schedule an interrupt and the flag will
+ * not be set.
+ *
+ * The flags are set by xen when notify replies
+ * V4V_RING_DATA_F_EMPTY	ring is empty
+ * V4V_RING_DATA_F_PENDING	interrupt is pending - don't rely on this
+ * V4V_RING_DATA_F_SUFFICIENT	sufficient space for space_required is there
+ * V4V_RING_DATA_F_EXISTS	ring exists
+ *
+ * v4v_hypercall(V4VOP_notify,
+ *               XEN_GUEST_HANDLE(v4v_ring_data_ent) ent,
+ *               NULL, NULL, nent, 0)
+ */
+
+
+#define V4VOP_sendv		5
+/*
+ * Identical to V4VOP_send except rather than buf and len it takes
+ * an array of v4v_iov and a length of the array.
+ *
+ * v4v_hypercall(V4VOP_sendv,
+ *               v4v_addr src,
+ *               v4v_addr dst,
+ *               v4v_iov iov,
+ *               uint32_t niov,
+ *               uint32_t protocol)
+ */
+
+#endif /* __XEN_PUBLIC_V4V_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 033cbba..dce0338 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -99,7 +99,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
 #define __HYPERVISOR_domctl               36
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
-#define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_v4v_op               39
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 53804c8..457e3f2 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -23,6 +23,7 @@
 #include <public/sysctl.h>
 #include <public/vcpu.h>
 #include <public/mem_event.h>
+#include <xen/v4v.h>
 
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
@@ -350,6 +351,10 @@ struct domain
     nodemask_t node_affinity;
     unsigned int last_alloc_node;
     spinlock_t node_affinity_lock;
+
+    /* v4v */
+    rwlock_t v4v_lock;
+    struct v4v_domain *v4v;
 };
 
 struct domain_setup_info
diff --git a/xen/include/xen/v4v.h b/xen/include/xen/v4v.h
new file mode 100644
index 0000000..641a6a8
--- /dev/null
+++ b/xen/include/xen/v4v.h
@@ -0,0 +1,187 @@
+/******************************************************************************
+ * V4V
+ *
+ * Version 2 of v2v (Virtual-to-Virtual)
+ *
+ * Copyright (c) 2010, Citrix Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __V4V_PRIVATE_H__
+#define __V4V_PRIVATE_H__
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/smp.h>
+#include <xen/shared.h>
+#include <xen/list.h>
+#include <public/v4v.h>
+
+#define V4V_HTABLE_SIZE 32
+
+#define V4V_PACKED __attribute__ ((packed))
+
+/*
+ * Structures
+ */
+
+typedef struct v4v_iov
+{
+    uint64_t iov_base;
+    uint64_t iov_len;
+} V4V_PACKED v4v_iov_t;
+DEFINE_XEN_GUEST_HANDLE (v4v_iov_t);
+
+typedef struct v4v_addr
+{
+    uint32_t port;
+    domid_t domain;
+} V4V_PACKED v4v_addr_t;
+DEFINE_XEN_GUEST_HANDLE (v4v_addr_t);
+
+typedef struct v4v_ring_id
+{
+    struct v4v_addr addr;
+    domid_t partner;
+} V4V_PACKED v4v_ring_id_t;
+
+typedef uint64_t v4v_pfn_t;
+DEFINE_XEN_GUEST_HANDLE (v4v_pfn_t);
+
+typedef struct v4v_ring
+{
+    uint64_t magic;
+    struct v4v_ring_id id;
+    uint32_t len;
+    uint32_t rx_ptr;
+    uint32_t tx_ptr;
+    uint64_t reserved[4];
+    uint8_t ring[0];
+} V4V_PACKED v4v_ring_t;
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_t);
+
+typedef struct v4v_ring_data_ent
+{
+    struct v4v_addr ring;
+    uint16_t flags;
+    uint16_t pad0;
+    uint32_t space_required;
+    uint32_t max_message_size;
+} V4V_PACKED v4v_ring_data_ent_t;
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_ent_t);
+
+typedef struct v4v_ring_data
+{
+    uint64_t magic;
+    uint32_t nent;
+    uint32_t padding;
+    uint64_t reserved[4];
+    v4v_ring_data_ent_t ring[0];
+} V4V_PACKED v4v_ring_data_t;
+DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_t);
+
+struct v4v_stream_header
+{
+    uint32_t flags;
+    uint32_t conid;
+} V4V_PACKED;
+
+struct v4v_ring_message_header
+{
+    uint32_t len;
+    struct v4v_addr source;
+    uint32_t protocol;
+    uint8_t data[0];
+} V4V_PACKED;
+
+/*
+ * Helper functions
+ */
+
+
+static inline uint16_t
+v4v_hash_fn (struct v4v_ring_id *id)
+{
+  uint16_t ret;
+  ret = (uint16_t) (id->addr.port >> 16);
+  ret ^= (uint16_t) id->addr.port;
+  ret ^= id->addr.domain;
+  ret ^= id->partner;
+
+  ret &= (V4V_HTABLE_SIZE-1);
+
+  return ret;
+}
+
+struct v4v_pending_ent
+{
+  struct hlist_node node;
+  domid_t id;
+  uint32_t len;
+} V4V_PACKED;
+
+
+struct v4v_ring_info
+{
+  /* next node in the hash, protected by L2  */
+  struct hlist_node node;
+  /* this ring's id, protected by L2 */
+  struct v4v_ring_id id;
+  /* L3 */
+  spinlock_t lock;
+  /* cached length of the ring (from ring->len), protected by L3 */
+  uint32_t len;
+  uint32_t npage;
+  /* cached tx pointer location, protected by L3 */
+  uint32_t tx_ptr;
+  /* guest ring, protected by L3 */
+  XEN_GUEST_HANDLE(v4v_ring_t) ring;
+  /* mapped ring pages protected by L3*/
+  uint8_t **mfn_mapping;
+  /* list of mfns of guest ring */
+  mfn_t *mfns;
+  /* list of struct v4v_pending_ent for this ring, L3 */
+  struct hlist_head pending;
+} V4V_PACKED;
+
+/*
+ * The value of the v4v element in a struct domain is
+ * protected by the global lock L1
+ */
+struct v4v_domain
+{
+  /* L2 */
+  rwlock_t lock;
+  /* protected by L2 */
+  struct hlist_head ring_hash[V4V_HTABLE_SIZE];
+} V4V_PACKED;
+
+void v4v_destroy(struct domain *d);
+int v4v_init(struct domain *d);
+long do_v4v_op (int cmd,
+                XEN_GUEST_HANDLE (void) arg1,
+                XEN_GUEST_HANDLE (void) arg2,
+                XEN_GUEST_HANDLE (void) arg3,
+                uint32_t arg4,
+                uint32_t arg5);
+
+#endif /* __V4V_PRIVATE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/5] v4v: Introduce basic access control to V4V
  2012-06-28 16:26 [PATCH 0/5] RFC: V4V (v2) Jean Guyader
                   ` (3 preceding siblings ...)
  2012-06-28 16:26 ` [PATCH 4/5] xen: Add V4V implementation Jean Guyader
@ 2012-06-28 16:26 ` Jean Guyader
  2012-07-05 14:23   ` Tim Deegan
  4 siblings, 1 reply; 29+ messages in thread
From: Jean Guyader @ 2012-06-28 16:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Jean Guyader

[-- Attachment #1: Type: text/plain, Size: 258 bytes --]


Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
---
 xen/common/v4v.c         |  265 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/v4v.h |    3 +
 xen/include/xen/v4v.h    |   26 +++++
 3 files changed, 294 insertions(+)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-v4v-Introduce-basic-access-control-to-V4V.patch --]
[-- Type: text/x-patch; name="0005-v4v-Introduce-basic-access-control-to-V4V.patch", Size: 9690 bytes --]

diff --git a/xen/common/v4v.c b/xen/common/v4v.c
index e589fda..381fcdd 100644
--- a/xen/common/v4v.c
+++ b/xen/common/v4v.c
@@ -43,6 +43,7 @@
 #endif
 
 
+struct list_head viprules = LIST_HEAD_INIT(viprules);
 
 DEFINE_XEN_GUEST_HANDLE (uint8_t);
 static struct v4v_ring_info *v4v_ring_find_info (struct domain *d,
@@ -1312,7 +1313,221 @@ v4v_notify (struct domain *d,
     return ret;
 }
 
+#ifdef V4V_DEBUG
+void
+v4v_viptables_print_rule (struct v4v_viptables_rule_node *rule)
+{
+  if (rule == NULL)
+    {
+      printk("(null)\n");
+      return;
+    }
+
+  if (rule->accept == 1)
+    printk("ACCEPT");
+  else
+    printk("REJECT");
+
+  printk(" ");
+
+  if (rule->src.domain == DOMID_INVALID)
+    printk("*");
+  else
+    printk("%i", rule->src.domain);
+
+  printk(":");
+
+  if (rule->src.port == -1)
+    printk("*");
+  else
+    printk("%i", rule->src.port);
+
+  printk(" -> ");
+
+  if (rule->dst.domain == DOMID_INVALID)
+    printk("*");
+  else
+    printk("%i", rule->dst.domain);
 
+  printk(":");
+
+  if (rule->dst.port == -1)
+    printk("*");
+  else
+    printk("%i", rule->dst.port);
+
+  printk("\n");
+}
+#endif /* V4V_DEBUG */
+
+int
+v4v_viptables_add (struct domain *src_d, XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+                   int32_t position)
+{
+  struct v4v_viptables_rule_node* new;
+  struct list_head* tmp;
+
+  /* First rule is n.1 */
+  position--;
+
+  new = xmalloc (struct v4v_viptables_rule_node);
+
+  if (copy_field_from_guest (new, rule, src))
+    return -EFAULT;
+  if (copy_field_from_guest (new, rule, dst))
+    return -EFAULT;
+  if (copy_field_from_guest (new, rule, accept))
+    return -EFAULT;
+
+#ifdef V4V_DEBUG
+  printk(KERN_ERR "VIPTables: ");
+  v4v_viptables_print_rule(new);
+#endif /* V4V_DEBUG */
+
+  tmp = &viprules;
+  while (position != 0 && tmp->next != &viprules)
+    {
+      tmp = tmp->next;
+      position--;
+    }
+  list_add(&new->list, tmp);
+
+  return 0;
+}
+
+int
+v4v_viptables_del (struct domain *src_d, XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
+                   int32_t position)
+{
+  struct list_head *tmp = NULL;
+  struct list_head *next = NULL;
+  struct v4v_viptables_rule_node *node;
+  struct v4v_viptables_rule *r;
+
+  if (position != -1)
+    {
+      /* We want to delete the rule number <position> */
+      tmp = &viprules;
+      while (position != 0 && tmp->next != &viprules)
+        {
+          tmp = tmp->next;
+          position--;
+        }
+    }
+  else if (!guest_handle_is_null(rule))
+    {
+      /* We want to delete the rule <rule> */
+      r = xmalloc (struct v4v_viptables_rule);
+
+      if (copy_field_from_guest (r, rule, src))
+        return -EFAULT;
+      if (copy_field_from_guest (r, rule, dst))
+        return -EFAULT;
+      if (copy_field_from_guest (r, rule, accept))
+        return -EFAULT;
+
+      list_for_each(tmp, &viprules)
+        {
+          node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+
+          if ((node->src.domain == r->src.domain) &&
+              (node->src.port   == r->src.port)   &&
+              (node->dst.domain == r->dst.domain) &&
+              (node->dst.port   == r->dst.port))
+            {
+              position = 0;
+              break;
+            }
+        }
+      xfree(r);
+    }
+  else
+    {
+      /* We want to flush the rules! */
+      printk(KERN_ERR "VIPTables: flushing rules\n");
+      list_for_each_safe(tmp, next, &viprules)
+        {
+          node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+          list_del(tmp);
+          xfree(node);
+        }
+    }
+
+#ifdef V4V_DEBUG
+  if (position == 0 && tmp != &viprules)
+    {
+      printk(KERN_ERR "VIPTables: deleting rule: ");
+      node = list_entry(tmp, struct v4v_viptables_rule_node, list);
+      v4v_viptables_print_rule(node);
+      list_del(tmp);
+      xfree(node);
+    }
+#endif /* V4V_DEBUG */
+
+  return 0;
+}
+
+static size_t
+v4v_viptables_list (struct domain *src_d, XEN_GUEST_HANDLE(v4v_viptables_list_t) list_hnd)
+{
+  struct list_head *ptr;
+  struct v4v_viptables_rule_node *node;
+  struct v4v_viptables_list rules_list;
+  uint32_t nbrules;
+
+  memset(&rules_list, 0, sizeof (rules_list));
+  if (copy_field_from_guest (&rules_list, list_hnd, nb_rules))
+      return -EFAULT;
+
+  ptr = viprules.next;
+  while (rules_list.nb_rules != 0 && ptr->next != &viprules)
+  {
+      ptr = ptr->next;
+      rules_list.start_rule--;
+  }
+
+  if (rules_list.nb_rules != 0)
+      return -EFAULT;
+
+  nbrules = 0;
+  while (nbrules < rules_list.nb_rules && ptr != &viprules)
+  {
+      node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+      rules_list.rules[rules_list.nb_rules].src = node->src;
+      rules_list.rules[rules_list.nb_rules].dst = node->dst;
+      rules_list.rules[rules_list.nb_rules].accept = node->accept;
+
+      nbrules++;
+      ptr = ptr->next;
+  }
+
+  if (copy_to_guest(list_hnd, &rules_list, 1))
+      return -EFAULT;
+
+  return 0;
+}
+
+static size_t
+v4v_viptables_check (v4v_addr_t * src, v4v_addr_t * dst)
+{
+  struct list_head *ptr;
+  struct v4v_viptables_rule_node *node;
+
+  list_for_each(ptr, &viprules)
+    {
+      node = list_entry(ptr, struct v4v_viptables_rule_node, list);
+
+      if ((node->src.domain == DOMID_INVALID || node->src.domain == src->domain) &&
+          (node->src.port   == -1            || node->src.port   == src->port)   &&
+          (node->dst.domain == DOMID_INVALID || node->dst.domain == dst->domain) &&
+          (node->dst.port   == -1            || node->dst.port   == dst->port))
+        return !node->accept;
+    }
+
+  /* Defaulting to ACCEPT */
+  return 0;
+}
 
 /*Hypercall to do the send*/
 static size_t
@@ -1351,6 +1566,15 @@ v4v_send (struct domain *src_d, v4v_addr_t * src_addr,
         return -ECONNREFUSED;
     }
 
+    if (v4v_viptables_check(src_addr, dst_addr) != 0)
+    {
+      read_unlock (&v4v_lock);
+      printk(KERN_ERR "V4V: VIPTables REJECTED %i:%i -> %i:%i\n",
+             src_addr->domain, src_addr->port,
+             dst_addr->domain, dst_addr->port);
+      return -ECONNREFUSED;
+    }
+
     do
     {
         if ( !dst_d->v4v )
@@ -1437,6 +1661,15 @@ v4v_sendv (struct domain *src_d, v4v_addr_t * src_addr,
         return -ECONNREFUSED;
     }
 
+    if (v4v_viptables_check(src_addr, dst_addr) != 0)
+    {
+      read_unlock (&v4v_lock);
+      printk(KERN_ERR "V4V: VIPTables REJECTED %i:%i -> %i:%i\n",
+             src_addr->domain, src_addr->port,
+             dst_addr->domain, dst_addr->port);
+      return -ECONNREFUSED;
+    }
+
     do
     {
         if ( !dst_d->v4v )
@@ -1596,6 +1829,38 @@ do_v4v_op (int cmd, XEN_GUEST_HANDLE (void) arg1,
                 rc = v4v_notify (d, ring_data_hnd);
                 break;
             }
+        case V4VOP_viptables_add:
+            {
+                uint32_t position = arg4;
+                XEN_GUEST_HANDLE (v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast (arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if (!IS_PRIV(d))
+                    goto out;
+                rc = v4v_viptables_add (d, rule_hnd, position);
+                break;
+            }
+        case V4VOP_viptables_del:
+            {
+                uint32_t position = arg4;
+                XEN_GUEST_HANDLE (v4v_viptables_rule_t) rule_hnd =
+                    guest_handle_cast (arg1, v4v_viptables_rule_t);
+                rc = -EPERM;
+                if (!IS_PRIV(d))
+                    goto out;
+                rc = v4v_viptables_del (d, rule_hnd, position);
+                break;
+            }
+        case V4VOP_viptables_list:
+            {
+                XEN_GUEST_HANDLE (v4v_viptables_list_t) rules_list_hnd =
+                    guest_handle_cast(arg1, v4v_viptables_list_t);
+                rc = -EPERM;
+                if (!IS_PRIV(d))
+                    goto out;
+                rc = v4v_viptables_list (d, rules_list_hnd);
+                break;
+            }
         default:
             rc = -ENOSYS;
             break;
diff --git a/xen/include/public/v4v.h b/xen/include/public/v4v.h
index 197770e..d8ca507 100644
--- a/xen/include/public/v4v.h
+++ b/xen/include/public/v4v.h
@@ -213,6 +213,9 @@
  *               NULL, NULL, nent, 0)
  */
 
+#define V4VOP_viptables_add     6
+#define V4VOP_viptables_del     7
+#define V4VOP_viptables_list    8
 
 #define V4VOP_sendv		5
 /*
diff --git a/xen/include/xen/v4v.h b/xen/include/xen/v4v.h
index 641a6a8..e5b4cb7 100644
--- a/xen/include/xen/v4v.h
+++ b/xen/include/xen/v4v.h
@@ -103,6 +103,32 @@ struct v4v_ring_message_header
     uint8_t data[0];
 } V4V_PACKED;
 
+typedef struct v4v_viptables_rule
+{
+    struct v4v_addr src;
+    struct v4v_addr dst;
+    uint32_t accept;
+} V4V_PACKED v4v_viptables_rule_t;
+
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_rule_t);
+
+struct v4v_viptables_rule_node
+{
+    struct list_head list;
+    struct v4v_addr src;
+    struct v4v_addr dst;
+    uint32_t accept;
+} V4V_PACKED;
+
+typedef struct v4v_viptables_list
+{
+    uint32_t start_rule;
+    uint32_t nb_rules;
+    struct v4v_viptables_rule rules[0];
+} V4V_PACKED v4v_viptables_list_t;
+
+DEFINE_XEN_GUEST_HANDLE (v4v_viptables_list_t);
+
 /*
  * Helper functions
  */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] xen: add ssize_t
  2012-06-28 16:26 ` [PATCH 1/5] xen: add ssize_t Jean Guyader
@ 2012-06-29  8:05   ` Jan Beulich
  2012-06-29 10:09     ` Jean Guyader
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-06-29  8:05 UTC (permalink / raw)
  To: Jean Guyader; +Cc: xen-devel

>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:

If this is really needed (which I doubt, looking at the two users
and their - respectively - sole callers), then for x86 please put
the definitions alongside the size_t ones.

Until the need for the type is shown, this is a NAK from me.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] v4v: Introduce VIRQ_V4V
  2012-06-28 16:26 ` [PATCH 2/5] v4v: Introduce VIRQ_V4V Jean Guyader
@ 2012-06-29  8:07   ` Jan Beulich
  2012-06-29 10:33     ` Jean Guyader
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-06-29  8:07 UTC (permalink / raw)
  To: Jean Guyader; +Cc: xen-devel

>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:

This appears to be unchanged from v1, so the inconsistency
between implementation (per-vCPU vIRQ) and documentation
(global vIRQ) remains.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] xen: Enforce introduce guest_handle_for_field
  2012-06-28 16:26 ` [PATCH 3/5] xen: Enforce introduce guest_handle_for_field Jean Guyader
@ 2012-06-29  8:10   ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2012-06-29  8:10 UTC (permalink / raw)
  To: Jean Guyader, xen-devel

>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
> This helper turns a field of a GUEST_HANDLE in
> a GUEST_HANDLE.
> 
> Signed-off-by: Jean Guyader <jean.guyader@citrix.com>

Minor or not - this patch is not from you originally.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-06-28 16:26 ` [PATCH 4/5] xen: Add V4V implementation Jean Guyader
@ 2012-06-29  8:33   ` Jan Beulich
  2012-06-29 10:03     ` Jean Guyader
  2012-07-05 11:36   ` Tim Deegan
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-06-29  8:33 UTC (permalink / raw)
  To: Jean Guyader; +Cc: xen-devel

>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
> --- /dev/null
> +++ b/xen/include/public/v4v.h
> @@ -0,0 +1,240 @@
> +/******************************************************************************
> + * V4V
> + *
> + * Version 2 of v2v (Virtual-to-Virtual)
> + *
> + * Copyright (c) 2010, Citrix Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#ifndef __XEN_PUBLIC_V4V_H__
> +#define __XEN_PUBLIC_V4V_H__
> +
> +#include "xen.h"
> +
> +/*
> + * Structure definitions
> + */
> +
> +#define V4V_PROTO_DGRAM		0x3c2c1db8
> +#define V4V_PROTO_STREAM 	0x70f6a8e5
> +
> +#ifdef __i386__

How about ARM?

> +# define V4V_RING_MAGIC         0xdf6977f231abd910ULL
> +# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302dULL

Is there any reason these can't be also used for 64-bit?

> +#else
> +# define V4V_RING_MAGIC         0xdf6977f231abd910
> +# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302d
> +#endif
> +#define V4V_DOMID_INVALID       (0x7FFFU)

Any reason not to use DOMID_INVALID instead?

> +#define V4V_DOMID_NONE          V4V_DOMID_INVALID
> +#define V4V_DOMID_ANY           V4V_DOMID_INVALID
> +#define V4V_PORT_NONE           0
> +
> +/*
> + * struct v4v_iov
> + * {
> + *      64 bits: iov_base
> + *      64 bits: iov_len
> + * }
> + */

What's the point of not defining the types here?

> +
> +/*
> + * struct v4v_addr
> + * {
> + *      32 bits: port
> + *      16 bits: domid
> + * }
> + */
> +
> +/*
> + * v4v_ring_id
> + * {
> + *      struct v4v_addr: addr
> + *      16 bits: partner
> + * }
> + */
> +
> +/*
> + * v4v_ring
> + * {
> + *      64 bits: magic
> + *      v4v_rind_id: id
> + *      32 bits: len
> + *      32 bits: rx_ptr
> + *      32 bits: tx_ptr
> + *      64 bits: padding
> + *      ... : ring
> + * }
> + *
> + * id:
> + * xen only looks at this during register/unregister
> + * and will fill in id.addr.domain
> + *
> + * rx_ptr: rx pointer, modified by domain
> + * tx_ptr: tx pointer, modified by xen
> + */
> +
> +#ifdef __i386__
> +#define V4V_RING_DATA_MAGIC	0x4ce4d30fbc82e92aULL

Same here as above.

> +#else
> +#define V4V_RING_DATA_MAGIC	0x4ce4d30fbc82e92a
> +#endif
> +
> +#define V4V_RING_DATA_F_EMPTY       1U << 0 /* Ring is empty */
> +#define V4V_RING_DATA_F_EXISTS      1U << 1 /* Ring exists */
> +#define V4V_RING_DATA_F_PENDING     1U << 2 /* Pending interrupt exists - do not
> +                                               rely on this field - for
> +                                               profiling only */
> +#define V4V_RING_DATA_F_SUFFICIENT  1U << 3 /* Sufficient space to queue
> +                                               space_required bytes exists */
> +
> +/*
> + * v4v_ring_data_ent
> + * {
> + *      v4v_addr: ring
> + *      16 bits: flags
> + *      16 bits: padding
> + *      32 bits: space_required
> + *      32 bits: max_message_size
> + * }
> + */
> +
> +/*
> + * v4v_ring_data
> + * {
> + *      64 bits: magic (V4V_RING_DATA_MAGIC)
> + *      32 bits: nent
> + *      32 bits: padding
> + *      256 bits: reserved
> + *      ... : v4v_ring_data_ent
> + * }
> + */
> +
> +
> +#define V4V_ROUNDUP(a) (((a) +0xf ) & ~0xf)
> +/*
> + * Messages on the ring are padded to 128 bits
> + * Len here refers to the exact length of the data not including the
> + * 128 bit header. The message uses
> + * ((len +0xf) & ~0xf) + sizeof(v4v_ring_message_header) bytes
> + */
> +
> +/*
> + * v4v_stream_header
> + * {
> + *      32 bits: flags
> + *      32 bits: conid
> + * }
> + */
> +
> +/*
> + * v4v_ring_message_header
> + * {
> + *      32 bits: len
> + *      v4v_addr: source
> + *      32 bits: protocol
> + *      ... : data
> + * }
> + */
> +
> +/*
> + * HYPERCALLS
> + */
> +
> +#define V4VOP_register_ring 	1
> +/*
> + * Registers a ring with Xen, if a ring with the same v4v_ring_id exists,
> + * this ring takes its place, registration will not change tx_ptr
> + * unless it is invalid
> + *
> + * do_v4v_op(V4VOP_unregister_ring,
> + *           v4v_ring, XEN_GUEST_HANDLE(v4v_pfn),
> + *           NULL, npage, 0)
> + */
> +
> +
> +#define V4VOP_unregister_ring 	2
> +/*
> + * Unregister a ring.
> + *
> + * v4v_hypercall(V4VOP_send, v4v_ring, NULL, NULL, 0, 0)

Assuming this and the earlier do_v4v_op() refer to the same
thing, please use a single name consistently.

> + */
> +
> +#define V4VOP_send 		3
> +/*
> + * Sends len bytes of buf to dst, giving src as the source address (xen will
> + * ignore src->domain and put your domain in the actually message), xen
> + * first looks for a ring with id.addr==dst and id.partner==sending_domain
> + * if that fails it looks for id.addr==dst and id.partner==DOMID_ANY.
> + * protocol is the 32 bit protocol number used from the message
> + * most likely V4V_PROTO_DGRAM or STREAM. If insufficient space exists
> + * it will return -EAGAIN and xen will twing the V4V_INTERRUPT when
> + * sufficient space becomes available
> + *
> + * v4v_hypercall(V4VOP_send,
> + *               v4v_addr src,
> + *               v4v_addr dst,
> + *               void* buf,
> + *               uint32_t len,
> + *               uint32_t protocol)
> + */
> +
> +
> +#define V4VOP_notify 		4
> +/* Asks xen for information about other rings in the system
> + *
> + * ent->ring is the v4v_addr_t of the ring you want information on
> + * the same matching rules are used as for V4VOP_send.
> + *
> + * ent->space_required  if this field is not null xen will check
> + * that there is space in the destination ring for this many bytes
> + * of payload. If there is it will set the V4V_RING_DATA_F_SUFFICIENT
> + * and CANCEL any pending interrupt for that ent->ring, if insufficient
> + * space is available it will schedule an interrupt and the flag will
> + * not be set.
> + *
> + * The flags are set by xen when notify replies
> + * V4V_RING_DATA_F_EMPTY	ring is empty
> + * V4V_RING_DATA_F_PENDING	interrupt is pending - don't rely on this
> + * V4V_RING_DATA_F_SUFFICIENT	sufficient space for space_required is there
> + * V4V_RING_DATA_F_EXISTS	ring exists
> + *
> + * v4v_hypercall(V4VOP_notify,
> + *               XEN_GUEST_HANDLE(v4v_ring_data_ent) ent,
> + *               NULL, NULL, nent, 0)
> + */
> +
> +
> +#define V4VOP_sendv		5
> +/*
> + * Identical to V4VOP_send except rather than buf and len it takes
> + * an array of v4v_iov and a length of the array.
> + *
> + * v4v_hypercall(V4VOP_sendv,
> + *               v4v_addr src,
> + *               v4v_addr dst,
> + *               v4v_iov iov,
> + *               uint32_t niov,
> + *               uint32_t protocol)
> + */
> +
> +#endif /* __XEN_PUBLIC_V4V_H__ */
> --- /dev/null
> +++ b/xen/include/xen/v4v.h
> @@ -0,0 +1,187 @@
> +/******************************************************************************
> + * V4V
> + *
> + * Version 2 of v2v (Virtual-to-Virtual)
> + *
> + * Copyright (c) 2010, Citrix Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#ifndef __V4V_PRIVATE_H__
> +#define __V4V_PRIVATE_H__
> +
> +#include <xen/config.h>
> +#include <xen/types.h>
> +#include <xen/spinlock.h>
> +#include <xen/smp.h>
> +#include <xen/shared.h>
> +#include <xen/list.h>
> +#include <public/v4v.h>
> +
> +#define V4V_HTABLE_SIZE 32
> +
> +#define V4V_PACKED __attribute__ ((packed))
> +
> +/*
> + * Structures
> + */
> +
> +typedef struct v4v_iov
> +{
> +    uint64_t iov_base;
> +    uint64_t iov_len;
> +} V4V_PACKED v4v_iov_t;

While you moved this to a private header now, I continue to
think that none of the uses of V4V_PACKED are actually
warranted (and hence these can't serve as reason for moving
these public definitions into a private header). Use explicit
padding fields, and you ought to be fine.

> +DEFINE_XEN_GUEST_HANDLE (v4v_iov_t);
> +
> +typedef struct v4v_addr
> +{
> +    uint32_t port;
> +    domid_t domain;
> +} V4V_PACKED v4v_addr_t;
> +DEFINE_XEN_GUEST_HANDLE (v4v_addr_t);
> +
> +typedef struct v4v_ring_id
> +{
> +    struct v4v_addr addr;
> +    domid_t partner;
> +} V4V_PACKED v4v_ring_id_t;
> +
> +typedef uint64_t v4v_pfn_t;
> +DEFINE_XEN_GUEST_HANDLE (v4v_pfn_t);
> +
> +typedef struct v4v_ring
> +{
> +    uint64_t magic;
> +    struct v4v_ring_id id;
> +    uint32_t len;
> +    uint32_t rx_ptr;
> +    uint32_t tx_ptr;
> +    uint64_t reserved[4];
> +    uint8_t ring[0];
> +} V4V_PACKED v4v_ring_t;
> +DEFINE_XEN_GUEST_HANDLE (v4v_ring_t);

If moved into the public header (where they belong imo), apart
from this one none of the guest handle defines should actually be
there, as there's no guest interface that would make use of them
(they're used internally to v4v.c only).

Also, conventionally there's no space before the opening
parenthesis here.

Jan

> +
> +typedef struct v4v_ring_data_ent
> +{
> +    struct v4v_addr ring;
> +    uint16_t flags;
> +    uint16_t pad0;
> +    uint32_t space_required;
> +    uint32_t max_message_size;
> +} V4V_PACKED v4v_ring_data_ent_t;
> +DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_ent_t);
> +
> +typedef struct v4v_ring_data
> +{
> +    uint64_t magic;
> +    uint32_t nent;
> +    uint32_t padding;
> +    uint64_t reserved[4];
> +    v4v_ring_data_ent_t ring[0];
> +} V4V_PACKED v4v_ring_data_t;
> +DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_t);
> +
> +struct v4v_stream_header
> +{
> +    uint32_t flags;
> +    uint32_t conid;
> +} V4V_PACKED;
> +
> +struct v4v_ring_message_header
> +{
> +    uint32_t len;
> +    struct v4v_addr source;
> +    uint32_t protocol;
> +    uint8_t data[0];
> +} V4V_PACKED;
> +
> +/*
> + * Helper functions
> + */
> +
> +
> +static inline uint16_t
> +v4v_hash_fn (struct v4v_ring_id *id)
> +{
> +  uint16_t ret;
> +  ret = (uint16_t) (id->addr.port >> 16);
> +  ret ^= (uint16_t) id->addr.port;
> +  ret ^= id->addr.domain;
> +  ret ^= id->partner;
> +
> +  ret &= (V4V_HTABLE_SIZE-1);
> +
> +  return ret;
> +}
> +
> +struct v4v_pending_ent
> +{
> +  struct hlist_node node;
> +  domid_t id;
> +  uint32_t len;
> +} V4V_PACKED;
> +
> +
> +struct v4v_ring_info
> +{
> +  /* next node in the hash, protected by L2  */
> +  struct hlist_node node;
> +  /* this ring's id, protected by L2 */
> +  struct v4v_ring_id id;
> +  /* L3 */
> +  spinlock_t lock;
> +  /* cached length of the ring (from ring->len), protected by L3 */
> +  uint32_t len;
> +  uint32_t npage;
> +  /* cached tx pointer location, protected by L3 */
> +  uint32_t tx_ptr;
> +  /* guest ring, protected by L3 */
> +  XEN_GUEST_HANDLE(v4v_ring_t) ring;
> +  /* mapped ring pages protected by L3*/
> +  uint8_t **mfn_mapping;
> +  /* list of mfns of guest ring */
> +  mfn_t *mfns;
> +  /* list of struct v4v_pending_ent for this ring, L3 */
> +  struct hlist_head pending;
> +} V4V_PACKED;
> +
> +/*
> + * The value of the v4v element in a struct domain is
> + * protected by the global lock L1
> + */
> +struct v4v_domain
> +{
> +  /* L2 */
> +  rwlock_t lock;
> +  /* protected by L2 */
> +  struct hlist_head ring_hash[V4V_HTABLE_SIZE];
> +} V4V_PACKED;
> +
> +void v4v_destroy(struct domain *d);
> +int v4v_init(struct domain *d);
> +long do_v4v_op (int cmd,
> +                XEN_GUEST_HANDLE (void) arg1,
> +                XEN_GUEST_HANDLE (void) arg2,
> +                XEN_GUEST_HANDLE (void) arg3,
> +                uint32_t arg4,
> +                uint32_t arg5);
> +
> +#endif /* __V4V_PRIVATE_H__ */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-06-29  8:33   ` Jan Beulich
@ 2012-06-29 10:03     ` Jean Guyader
  2012-06-29 10:36       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Jean Guyader @ 2012-06-29 10:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jean Guyader, xen-devel

On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>> --- /dev/null
>> +++ b/xen/include/public/v4v.h
>> @@ -0,0 +1,240 @@
>> +/******************************************************************************
>> + * V4V
>> + *
>> + * Version 2 of v2v (Virtual-to-Virtual)
>> + *
>> + * Copyright (c) 2010, Citrix Systems
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>> + */
>> +
>> +#ifndef __XEN_PUBLIC_V4V_H__
>> +#define __XEN_PUBLIC_V4V_H__
>> +
>> +#include "xen.h"
>> +
>> +/*
>> + * Structure definitions
>> + */
>> +
>> +#define V4V_PROTO_DGRAM              0x3c2c1db8
>> +#define V4V_PROTO_STREAM     0x70f6a8e5
>> +
>> +#ifdef __i386__
>
> How about ARM?
>
>> +# define V4V_RING_MAGIC         0xdf6977f231abd910ULL
>> +# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302dULL
>
> Is there any reason these can't be also used for 64-bit?
>
>> +#else
>> +# define V4V_RING_MAGIC         0xdf6977f231abd910
>> +# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302d
>> +#endif
>> +#define V4V_DOMID_INVALID       (0x7FFFU)
>
> Any reason not to use DOMID_INVALID instead?
>

DOMID_INVALID has changed in the last releases we wanted to have our
definition so
it will be stable accross Xen 3.* Xen 4.*.

>> +#define V4V_DOMID_NONE          V4V_DOMID_INVALID
>> +#define V4V_DOMID_ANY           V4V_DOMID_INVALID
>> +#define V4V_PORT_NONE           0
>> +
>> +/*
>> + * struct v4v_iov
>> + * {
>> + *      64 bits: iov_base
>> + *      64 bits: iov_len
>> + * }
>> + */
>
> What's the point of not defining the types here?

Answer bellow.

>
>> +
>> +/*
>> + * struct v4v_addr
>> + * {
>> + *      32 bits: port
>> + *      16 bits: domid
>> + * }
>> + */
>> +
>> +/*
>> + * v4v_ring_id
>> + * {
>> + *      struct v4v_addr: addr
>> + *      16 bits: partner
>> + * }
>> + */
>> +
>> +/*
>> + * v4v_ring
>> + * {
>> + *      64 bits: magic
>> + *      v4v_rind_id: id
>> + *      32 bits: len
>> + *      32 bits: rx_ptr
>> + *      32 bits: tx_ptr
>> + *      64 bits: padding
>> + *      ... : ring
>> + * }
>> + *
>> + * id:
>> + * xen only looks at this during register/unregister
>> + * and will fill in id.addr.domain
>> + *
>> + * rx_ptr: rx pointer, modified by domain
>> + * tx_ptr: tx pointer, modified by xen
>> + */
>> +
>> +#ifdef __i386__
>> +#define V4V_RING_DATA_MAGIC  0x4ce4d30fbc82e92aULL
>
> Same here as above.
>
>> +#else
>> +#define V4V_RING_DATA_MAGIC  0x4ce4d30fbc82e92a
>> +#endif
>> +
>> +#define V4V_RING_DATA_F_EMPTY       1U << 0 /* Ring is empty */
>> +#define V4V_RING_DATA_F_EXISTS      1U << 1 /* Ring exists */
>> +#define V4V_RING_DATA_F_PENDING     1U << 2 /* Pending interrupt exists - do not
>> +                                               rely on this field - for
>> +                                               profiling only */
>> +#define V4V_RING_DATA_F_SUFFICIENT  1U << 3 /* Sufficient space to queue
>> +                                               space_required bytes exists */
>> +
>> +/*
>> + * v4v_ring_data_ent
>> + * {
>> + *      v4v_addr: ring
>> + *      16 bits: flags
>> + *      16 bits: padding
>> + *      32 bits: space_required
>> + *      32 bits: max_message_size
>> + * }
>> + */
>> +
>> +/*
>> + * v4v_ring_data
>> + * {
>> + *      64 bits: magic (V4V_RING_DATA_MAGIC)
>> + *      32 bits: nent
>> + *      32 bits: padding
>> + *      256 bits: reserved
>> + *      ... : v4v_ring_data_ent
>> + * }
>> + */
>> +
>> +
>> +#define V4V_ROUNDUP(a) (((a) +0xf ) & ~0xf)
>> +/*
>> + * Messages on the ring are padded to 128 bits
>> + * Len here refers to the exact length of the data not including the
>> + * 128 bit header. The message uses
>> + * ((len +0xf) & ~0xf) + sizeof(v4v_ring_message_header) bytes
>> + */
>> +
>> +/*
>> + * v4v_stream_header
>> + * {
>> + *      32 bits: flags
>> + *      32 bits: conid
>> + * }
>> + */
>> +
>> +/*
>> + * v4v_ring_message_header
>> + * {
>> + *      32 bits: len
>> + *      v4v_addr: source
>> + *      32 bits: protocol
>> + *      ... : data
>> + * }
>> + */
>> +
>> +/*
>> + * HYPERCALLS
>> + */
>> +
>> +#define V4VOP_register_ring  1
>> +/*
>> + * Registers a ring with Xen, if a ring with the same v4v_ring_id exists,
>> + * this ring takes its place, registration will not change tx_ptr
>> + * unless it is invalid
>> + *
>> + * do_v4v_op(V4VOP_unregister_ring,
>> + *           v4v_ring, XEN_GUEST_HANDLE(v4v_pfn),
>> + *           NULL, npage, 0)
>> + */
>> +
>> +
>> +#define V4VOP_unregister_ring        2
>> +/*
>> + * Unregister a ring.
>> + *
>> + * v4v_hypercall(V4VOP_send, v4v_ring, NULL, NULL, 0, 0)
>
> Assuming this and the earlier do_v4v_op() refer to the same
> thing, please use a single name consistently.
>

Ack.

>> + */
>> +
>> +#define V4VOP_send           3
>> +/*
>> + * Sends len bytes of buf to dst, giving src as the source address (xen will
>> + * ignore src->domain and put your domain in the actually message), xen
>> + * first looks for a ring with id.addr==dst and id.partner==sending_domain
>> + * if that fails it looks for id.addr==dst and id.partner==DOMID_ANY.
>> + * protocol is the 32 bit protocol number used from the message
>> + * most likely V4V_PROTO_DGRAM or STREAM. If insufficient space exists
>> + * it will return -EAGAIN and xen will twing the V4V_INTERRUPT when
>> + * sufficient space becomes available
>> + *
>> + * v4v_hypercall(V4VOP_send,
>> + *               v4v_addr src,
>> + *               v4v_addr dst,
>> + *               void* buf,
>> + *               uint32_t len,
>> + *               uint32_t protocol)
>> + */
>> +
>> +
>> +#define V4VOP_notify                 4
>> +/* Asks xen for information about other rings in the system
>> + *
>> + * ent->ring is the v4v_addr_t of the ring you want information on
>> + * the same matching rules are used as for V4VOP_send.
>> + *
>> + * ent->space_required  if this field is not null xen will check
>> + * that there is space in the destination ring for this many bytes
>> + * of payload. If there is it will set the V4V_RING_DATA_F_SUFFICIENT
>> + * and CANCEL any pending interrupt for that ent->ring, if insufficient
>> + * space is available it will schedule an interrupt and the flag will
>> + * not be set.
>> + *
>> + * The flags are set by xen when notify replies
>> + * V4V_RING_DATA_F_EMPTY     ring is empty
>> + * V4V_RING_DATA_F_PENDING   interrupt is pending - don't rely on this
>> + * V4V_RING_DATA_F_SUFFICIENT        sufficient space for space_required is there
>> + * V4V_RING_DATA_F_EXISTS    ring exists
>> + *
>> + * v4v_hypercall(V4VOP_notify,
>> + *               XEN_GUEST_HANDLE(v4v_ring_data_ent) ent,
>> + *               NULL, NULL, nent, 0)
>> + */
>> +
>> +
>> +#define V4VOP_sendv          5
>> +/*
>> + * Identical to V4VOP_send except rather than buf and len it takes
>> + * an array of v4v_iov and a length of the array.
>> + *
>> + * v4v_hypercall(V4VOP_sendv,
>> + *               v4v_addr src,
>> + *               v4v_addr dst,
>> + *               v4v_iov iov,
>> + *               uint32_t niov,
>> + *               uint32_t protocol)
>> + */
>> +
>> +#endif /* __XEN_PUBLIC_V4V_H__ */
>> --- /dev/null
>> +++ b/xen/include/xen/v4v.h
>> @@ -0,0 +1,187 @@
>> +/******************************************************************************
>> + * V4V
>> + *
>> + * Version 2 of v2v (Virtual-to-Virtual)
>> + *
>> + * Copyright (c) 2010, Citrix Systems
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>> + */
>> +
>> +#ifndef __V4V_PRIVATE_H__
>> +#define __V4V_PRIVATE_H__
>> +
>> +#include <xen/config.h>
>> +#include <xen/types.h>
>> +#include <xen/spinlock.h>
>> +#include <xen/smp.h>
>> +#include <xen/shared.h>
>> +#include <xen/list.h>
>> +#include <public/v4v.h>
>> +
>> +#define V4V_HTABLE_SIZE 32
>> +
>> +#define V4V_PACKED __attribute__ ((packed))
>> +
>> +/*
>> + * Structures
>> + */
>> +
>> +typedef struct v4v_iov
>> +{
>> +    uint64_t iov_base;
>> +    uint64_t iov_len;
>> +} V4V_PACKED v4v_iov_t;
>
> While you moved this to a private header now, I continue to
> think that none of the uses of V4V_PACKED are actually
> warranted (and hence these can't serve as reason for moving
> these public definitions into a private header). Use explicit
> padding fields, and you ought to be fine.
>

Answer bellow.

>> +DEFINE_XEN_GUEST_HANDLE (v4v_iov_t);
>> +
>> +typedef struct v4v_addr
>> +{
>> +    uint32_t port;
>> +    domid_t domain;
>> +} V4V_PACKED v4v_addr_t;
>> +DEFINE_XEN_GUEST_HANDLE (v4v_addr_t);
>> +
>> +typedef struct v4v_ring_id
>> +{
>> +    struct v4v_addr addr;
>> +    domid_t partner;
>> +} V4V_PACKED v4v_ring_id_t;
>> +

This structure is really the one that cause trouble. domid_t is 16b
and v4v_addr_t is used
inside v4v_ring_t. I would like the structure to remind as close as we
can from the original version
as we already versions in the field. Having explicit padding will make
all the structures different
which will make much harder to write a driver that will support the
two versions of the API.

Also most all the consumer of those headers will have to rewrite the
structure anyway, for instance
the Linux kernel have it's own naming convention, macros definitions
which are different, etc..

>> +typedef uint64_t v4v_pfn_t;
>> +DEFINE_XEN_GUEST_HANDLE (v4v_pfn_t);
>> +
>> +typedef struct v4v_ring
>> +{
>> +    uint64_t magic;
>> +    struct v4v_ring_id id;
>> +    uint32_t len;
>> +    uint32_t rx_ptr;
>> +    uint32_t tx_ptr;
>> +    uint64_t reserved[4];
>> +    uint8_t ring[0];
>> +} V4V_PACKED v4v_ring_t;
>> +DEFINE_XEN_GUEST_HANDLE (v4v_ring_t);
>
> If moved into the public header (where they belong imo), apart
> from this one none of the guest handle defines should actually be
> there, as there's no guest interface that would make use of them
> (they're used internally to v4v.c only).
>
> Also, conventionally there's no space before the opening
> parenthesis here.
>
> Jan
>
>> +
>> +typedef struct v4v_ring_data_ent
>> +{
>> +    struct v4v_addr ring;
>> +    uint16_t flags;
>> +    uint16_t pad0;
>> +    uint32_t space_required;
>> +    uint32_t max_message_size;
>> +} V4V_PACKED v4v_ring_data_ent_t;
>> +DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_ent_t);
>> +
>> +typedef struct v4v_ring_data
>> +{
>> +    uint64_t magic;
>> +    uint32_t nent;
>> +    uint32_t padding;
>> +    uint64_t reserved[4];
>> +    v4v_ring_data_ent_t ring[0];
>> +} V4V_PACKED v4v_ring_data_t;
>> +DEFINE_XEN_GUEST_HANDLE (v4v_ring_data_t);
>> +
>> +struct v4v_stream_header
>> +{
>> +    uint32_t flags;
>> +    uint32_t conid;
>> +} V4V_PACKED;
>> +
>> +struct v4v_ring_message_header
>> +{
>> +    uint32_t len;
>> +    struct v4v_addr source;
>> +    uint32_t protocol;
>> +    uint8_t data[0];
>> +} V4V_PACKED;
>> +
>> +/*
>> + * Helper functions
>> + */
>> +
>> +
>> +static inline uint16_t
>> +v4v_hash_fn (struct v4v_ring_id *id)
>> +{
>> +  uint16_t ret;
>> +  ret = (uint16_t) (id->addr.port >> 16);
>> +  ret ^= (uint16_t) id->addr.port;
>> +  ret ^= id->addr.domain;
>> +  ret ^= id->partner;
>> +
>> +  ret &= (V4V_HTABLE_SIZE-1);
>> +
>> +  return ret;
>> +}
>> +
>> +struct v4v_pending_ent
>> +{
>> +  struct hlist_node node;
>> +  domid_t id;
>> +  uint32_t len;
>> +} V4V_PACKED;
>> +
>> +
>> +struct v4v_ring_info
>> +{
>> +  /* next node in the hash, protected by L2  */
>> +  struct hlist_node node;
>> +  /* this ring's id, protected by L2 */
>> +  struct v4v_ring_id id;
>> +  /* L3 */
>> +  spinlock_t lock;
>> +  /* cached length of the ring (from ring->len), protected by L3 */
>> +  uint32_t len;
>> +  uint32_t npage;
>> +  /* cached tx pointer location, protected by L3 */
>> +  uint32_t tx_ptr;
>> +  /* guest ring, protected by L3 */
>> +  XEN_GUEST_HANDLE(v4v_ring_t) ring;
>> +  /* mapped ring pages protected by L3*/
>> +  uint8_t **mfn_mapping;
>> +  /* list of mfns of guest ring */
>> +  mfn_t *mfns;
>> +  /* list of struct v4v_pending_ent for this ring, L3 */
>> +  struct hlist_head pending;
>> +} V4V_PACKED;
>> +
>> +/*
>> + * The value of the v4v element in a struct domain is
>> + * protected by the global lock L1
>> + */
>> +struct v4v_domain
>> +{
>> +  /* L2 */
>> +  rwlock_t lock;
>> +  /* protected by L2 */
>> +  struct hlist_head ring_hash[V4V_HTABLE_SIZE];
>> +} V4V_PACKED;
>> +
>> +void v4v_destroy(struct domain *d);
>> +int v4v_init(struct domain *d);
>> +long do_v4v_op (int cmd,
>> +                XEN_GUEST_HANDLE (void) arg1,
>> +                XEN_GUEST_HANDLE (void) arg2,
>> +                XEN_GUEST_HANDLE (void) arg3,
>> +                uint32_t arg4,
>> +                uint32_t arg5);
>> +
>> +#endif /* __V4V_PRIVATE_H__ */
>

Jean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] xen: add ssize_t
  2012-06-29  8:05   ` Jan Beulich
@ 2012-06-29 10:09     ` Jean Guyader
  2012-06-29 10:38       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Jean Guyader @ 2012-06-29 10:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jean Guyader, xen-devel

On 29 June 2012 09:05, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>
> If this is really needed (which I doubt, looking at the two users
> and their - respectively - sole callers), then for x86 please put
> the definitions alongside the size_t ones.
>
> Until the need for the type is shown, this is a NAK from me.
>

The ssize_t can't be replaced with a size_t because the functions that use it
can return negative number and size_t is unsigned.

ssize_t could be replaced by a long long which will work for all the
case 32 and 64b.

Jean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] v4v: Introduce VIRQ_V4V
  2012-06-29  8:07   ` Jan Beulich
@ 2012-06-29 10:33     ` Jean Guyader
  0 siblings, 0 replies; 29+ messages in thread
From: Jean Guyader @ 2012-06-29 10:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jean Guyader, xen-devel

On 29 June 2012 09:07, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>
> This appears to be unchanged from v1, so the inconsistency
> between implementation (per-vCPU vIRQ) and documentation
> (global vIRQ) remains.
>

Yes, I will change this once I've switched to event channel in a new version.

Jean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-06-29 10:03     ` Jean Guyader
@ 2012-06-29 10:36       ` Jan Beulich
  2012-07-18 20:09         ` Jean Guyader
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-06-29 10:36 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Jean Guyader, xen-devel

>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>> +typedef struct v4v_ring_id
>>> +{
>>> +    struct v4v_addr addr;
>>> +    domid_t partner;
>>> +} V4V_PACKED v4v_ring_id_t;
>>> +
> 
> This structure is really the one that cause trouble. domid_t is 16b
> and v4v_addr_t is used
> inside v4v_ring_t. I would like the structure to remind as close as we
> can from the original version
> as we already versions in the field. Having explicit padding will make
> all the structures different
> which will make much harder to write a driver that will support the
> two versions of the API.

Oh, I see, "partner" would end up on a different offset if the
packed attribute was removed from v4v_addr_t. But that
could still be solved by making this type a union:

typedef union v4v_ring_id
{
    struct v4v_addr addr;
    struct {
        uint32_t port;
        domid_t domain;
        domid_t partner;
    } full;
} v4v_ring_id_t;

That would guarantee binary compatibility. And you could even
achieve source compatibility for gcc users by making the naming
of the second structure conditional upon __GNUC__ being
undefined (or adding a second instance of the same, just
unnamed structure within a respective #ifdef - that would make
it possible to write code that can be compiled by both gcc and
non-gcc, yet existing gcc-only code would need changing).

> Also most all the consumer of those headers will have to rewrite the
> structure anyway, for instance
> the Linux kernel have it's own naming convention, macros definitions
> which are different, etc..

Such can usually be done via scripts, so having a fully defined
public header is still worthwhile.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] xen: add ssize_t
  2012-06-29 10:09     ` Jean Guyader
@ 2012-06-29 10:38       ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2012-06-29 10:38 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Jean Guyader, xen-devel

>>> On 29.06.12 at 12:09, Jean Guyader <jean.guyader@gmail.com> wrote:
> On 29 June 2012 09:05, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>
>> If this is really needed (which I doubt, looking at the two users
>> and their - respectively - sole callers), then for x86 please put
>> the definitions alongside the size_t ones.
>>
>> Until the need for the type is shown, this is a NAK from me.
>>
> 
> The ssize_t can't be replaced with a size_t because the functions that use 
> it can return negative number and size_t is unsigned.

Did you really look at the users? One stores the ssize_t value
in a uint32_t (losing the signedness), and the other stores it
into an int (losing the wider size).

> ssize_t could be replaced by a long long which will work for all the
> case 32 and 64b.

A long would do as well.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-06-28 16:26 ` [PATCH 4/5] xen: Add V4V implementation Jean Guyader
  2012-06-29  8:33   ` Jan Beulich
@ 2012-07-05 11:36   ` Tim Deegan
  1 sibling, 0 replies; 29+ messages in thread
From: Tim Deegan @ 2012-07-05 11:36 UTC (permalink / raw)
  To: Jean Guyader; +Cc: xen-devel

OK, some detailed comments below.  A lot of it is just nits, but one or
two serious concerns.  We still have some ongoing discussion of the
overall design in other threads, too...

At 17:26 +0100 on 28 Jun (1340904385), Jean Guyader wrote:
> +#ifdef V4V_DEBUG
> +#define MY_FILE "v4v.c"

Something wrong with __FILE__ ? 

> +#define v4v_dprintk(format, args...)                    \
> +    do {                                                \
> +        printk("%s:%d " format,                         \
> +               MY_FILE, __LINE__, ## args );            \
> +    } while ( 1 == 0 )
> +#else
> +#define v4v_dprintk(format, ... ) (void)0
> +#endif
> +
> 
> +#ifdef V4V_DEBUG
> +static void
> +v4v_hexdump (void *_p, int len)
> +{
> +    uint8_t *buf = (uint8_t *) _p;
> +    int i, j;
> +
> +    for (i = 0; i < len; i += 16)

Coding style is 'for ( i = 0; i < len; i += 16 )' (and similarly throughout).

> +    {
> +        printk (KERN_ERR "%p:", &buf[i]);
> +        for (j = 0; j < 16; ++j)
> +        {
> +            int k = i + j;
> +            if (k < len)

Likewise 'if ( k < len )'

> +                printk (" %02x", buf[k]);

but 'printk(...)' with no space before the args.  

> +/*
> + * ring buffer
> + */
> +
> +/* called must have L3 */

Maybe make these comments into ASSERT()s?

> +static void
> +v4v_ring_unmap (struct v4v_ring_info *ring_info)
> +{
> +    int i;
> +    for (i = 0; i < ring_info->npage; ++i)
> +    {
> +        if (!ring_info->mfn_mapping[i])
> +            continue;
> +        v4v_dprintk("");

I'm OK with having a lot of compiled-out debug printks, but that's
taking it a bit far. :)

> +/* called must have L3 */
> +static int
> +v4v_memcpy_from_guest_ring (void *_dst, struct v4v_ring_info *ring_info,
> +                            uint32_t offset, uint32_t len)

This function is only ever called to copy the ring_info out of a ring so
it probably doesn't need to be so general (handling multiple pages &c).
I guess the compiler can figure out that offset is always == 0 and trim
the dead code but we might as well cut it from the source too. :)

> +{
> +    int page = offset >> PAGE_SHIFT;
> +    uint8_t *src;
> +    uint8_t *dst = _dst;
> +
> +    offset &= PAGE_SIZE - 1;
> +
> +    while ((offset + len) > PAGE_SIZE)
> +    {
> +        src = v4v_ring_map_page (ring_info, page);
> +
> +        if (!src)
> +        {
> +            return -EFAULT;
> +        }

While I'm kvetching about style, maybe lose the braces around
single-line clauses like this.

> +
> +        v4v_dprintk("memcpy(%p,%p+%d,%d)\n",
> +                dst, src, offset,
> +                (int) (PAGE_SIZE - offset));
> +        memcpy (dst, src + offset, PAGE_SIZE - offset);
> +
> +        page++;
> +        len -= PAGE_SIZE - offset;
> +        dst += PAGE_SIZE - offset;
> +        offset = 0;
> +    }
> +
> +    src = v4v_ring_map_page (ring_info, page);
> +    if (!src)
> +    {
> +        return -EFAULT;
> +    }
> +
> +    v4v_dprintk("memcpy(%p,%p+%d,%d)\n", dst, src, offset, len);
> +    memcpy (dst, src + offset, len);
> +
> +    return 0;
> +}
> +
> +
> +/* called must have L3 */
> +static int
> +v4v_update_tx_ptr (struct v4v_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +    uint8_t *dst = v4v_ring_map_page (ring_info, 0);
> +    volatile uint32_t *p = (uint32_t *)(dst + offsetof (v4v_ring_t, tx_ptr));

What's the intention of using 'volatile' here? 

If it's to make sure you get a single atomic write you should probably
use the write_atomic() macro, which compiles to an explicit asm op of
the right size -- GCC explicitly does _not_ guarantee that it will use
atomic updates (though in practice it surely will).

If you want to make sure the receiver doesn't see the tx update before
the data, I think you need to use explicit memory barriers.  GCC doesn't
guarantee not to reorder other non-volatile accesses past this volatile
one, and even if it did this is common code and you can't rely on x86's
program-order semantics.

> +
> +    if (!dst)
> +        return -EFAULT;
> +    *p = tx_ptr;
> +    return 0;
> +}
> +
> +/* called must have L3 */
> +static int
> +v4v_memcpy_to_guest_ring (struct v4v_ring_info *ring_info, uint32_t offset,
> +        void *_src, uint32_t len)

This function and the _from_guest one are nearly identical, except for
the actual copy and updating the source pointer.  Is there any sensible
way to combine them?  Or would the result be too ugly?

> +static int
> +v4v_ringbuf_get_rx_ptr (struct domain *d, struct v4v_ring_info *ring_info,
> +                        uint32_t * rx_ptr)
> +{
> +    v4v_ring_t *ringp;
> +
> +    if ( ring_info->npage == 0 )
> +        return -1;
> +
> +    ringp = map_domain_page (mfn_x (ring_info->mfns[0]));
> +
> +    v4v_dprintk("v4v_ringbuf_payload_space: mapped %p to %p\n",
> +                (void *) mfn_x (ring_info->mfns[0]), ringp);
> +    if ( !ringp )
> +        return -1;
> +
> +    *rx_ptr = *(volatile uint32_t *) &ringp->rx_ptr;

I have the same comments about 'volatile' as I did above.


> +/*caller must have L3*/
> +static size_t
> +v4v_ringbuf_insert (struct domain *d,
> +                    struct v4v_ring_info *ring_info,
> +                    struct v4v_ring_id *src_id, uint32_t proto,
> +                    XEN_GUEST_HANDLE (void) buf_hnd_void, uint32_t len)
> 
> +static ssize_t
> +v4v_ringbuf_insertv (struct domain *d,
> +                     struct v4v_ring_info *ring_info,
> +                     struct v4v_ring_id *src_id, uint32_t proto,
> +                     XEN_GUEST_HANDLE (v4v_iov_t) iovs, uint32_t niov,
> +                     uint32_t len)

These two functions have a lot of repeated code as well.  

Could insert() be coded as a wrapper around insertv()?  If the
guest-handle-munging is a problem, maybe we could push the same decision
up the stack and only provide the vector version at the hypercall
interface?

> +/*caller must hold W(L2) */
> +static void v4v_ring_remove_mfns (struct v4v_ring_info *ring_info)
> +{
> +    int i;
> +
> +    if ( ring_info->mfns )
> +    {
> +        for ( i=0; i < ring_info->npage; ++i )
> +            if (mfn_x(ring_info->mfns[i]) != 0)
> +                put_page_and_type(mfn_to_page(mfn_x(ring_info->mfns[i])));
> +        xfree (ring_info->mfns);
> +    }
> +    ring_info->mfns = NULL;

I think this should be freeing mfn_mapping too. 

> +#ifdef __i386__
> +# define V4V_RING_MAGIC         0xdf6977f231abd910ULL
> +# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302dULL
> +#else
> +# define V4V_RING_MAGIC         0xdf6977f231abd910
> +# define V4V_PFN_LIST_MAGIC     0x91dd6159045b302d
> +#endif

Why the ifdef (and likewise for other magic numbers in this header)?  

> +#define V4V_DOMID_INVALID       (0x7FFFU)
> +#define V4V_DOMID_NONE          V4V_DOMID_INVALID
> +#define V4V_DOMID_ANY           V4V_DOMID_INVALID

The only one of these actually used in the rest of the patch is 
V4V_DOMID_NONE (in a context where surely _ANY would be better).
Can you get rid of the others?

> +#define V4V_PORT_NONE           0
> +
> +/*
> + * struct v4v_iov
> + * {
> + *      64 bits: iov_base
> + *      64 bits: iov_len
> + * }
> + */

I agree with Jan - it would be better to provide the actual definitions
of these structures, even if non-GCC users might need to post-process or
rewrite the header.

> +#define V4V_RING_DATA_F_EMPTY       1U << 0 /* Ring is empty */
> +#define V4V_RING_DATA_F_EXISTS      1U << 1 /* Ring exists */
> +#define V4V_RING_DATA_F_PENDING     1U << 2 /* Pending interrupt exists - do not
> +                                               rely on this field - for
> +                                               profiling only */
> +#define V4V_RING_DATA_F_SUFFICIENT  1U << 3 /* Sufficient space to queue
> +                                               space_required bytes exists */
> +

Please put parentheses around these flags. 

> +static inline uint16_t
> +v4v_hash_fn (struct v4v_ring_id *id)
> +{
> +  uint16_t ret;

Your indentation has got confused here (and for the rest of this file).

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] v4v: Introduce basic access control to V4V
  2012-06-28 16:26 ` [PATCH 5/5] v4v: Introduce basic access control to V4V Jean Guyader
@ 2012-07-05 14:23   ` Tim Deegan
  0 siblings, 0 replies; 29+ messages in thread
From: Tim Deegan @ 2012-07-05 14:23 UTC (permalink / raw)
  To: Jean Guyader; +Cc: xen-devel

Hi, 

At 17:26 +0100 on 28 Jun (1340904386), Jean Guyader wrote:
> +#ifdef V4V_DEBUG
> +void
> +v4v_viptables_print_rule (struct v4v_viptables_rule_node *rule)
> +{
> +  if (rule == NULL)
> +    {
> +      printk("(null)\n");
> +      return;
> +    }

The indentation doesn't follow the coding style at all in this patch.

> +int
> +v4v_viptables_add (struct domain *src_d, XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
> +                   int32_t position)
> +{
> +  struct v4v_viptables_rule_node* new;
> +  struct list_head* tmp;
> +
> +  /* First rule is n.1 */
> +  position--;
> +
> +  new = xmalloc (struct v4v_viptables_rule_node);

What if xmalloc() fails?

> +  if (copy_field_from_guest (new, rule, src))
> +    return -EFAULT;
> +  if (copy_field_from_guest (new, rule, dst))
> +    return -EFAULT;
> +  if (copy_field_from_guest (new, rule, accept))
> +    return -EFAULT;

Leaking 'new' here.

> +#ifdef V4V_DEBUG
> +  printk(KERN_ERR "VIPTables: ");
> +  v4v_viptables_print_rule(new);
> +#endif /* V4V_DEBUG */
> +
> +  tmp = &viprules;
> +  while (position != 0 && tmp->next != &viprules)
> +    {
> +      tmp = tmp->next;
> +      position--;
> +    }
> +  list_add(&new->list, tmp);

Doesn't this need to be protected by a lock?  AFAICS this function is
called under domain_lock(d) but modifies a global shared list, and the
readers don't take any locks.  If you expect table updates to be rare
then maybe write-locking the L1 lock would suffice.

> +
> +  return 0;
> +}
> +
> +int
> +v4v_viptables_del (struct domain *src_d, XEN_GUEST_HANDLE(v4v_viptables_rule_t) rule,
> +                   int32_t position)
> +{
> +  struct list_head *tmp = NULL;
> +  struct list_head *next = NULL;
> +  struct v4v_viptables_rule_node *node;
> +  struct v4v_viptables_rule *r;
> +
> +  if (position != -1)
> +    {
> +      /* We want to delete the rule number <position> */
> +      tmp = &viprules;
> +      while (position != 0 && tmp->next != &viprules)
> +        {
> +          tmp = tmp->next;
> +          position--;
> +        }
> +    }
> +  else if (!guest_handle_is_null(rule))
> +    {
> +      /* We want to delete the rule <rule> */
> +      r = xmalloc (struct v4v_viptables_rule);

It's probably OK for this to go on the stack - it's not that big, and...

> +      if (copy_field_from_guest (r, rule, src))
> +        return -EFAULT;
> +      if (copy_field_from_guest (r, rule, dst))
> +        return -EFAULT;
> +      if (copy_field_from_guest (r, rule, accept))
> +        return -EFAULT;

... it would stop you leaking 'r' here. 

> +      list_for_each(tmp, &viprules)
> +        {
> +          node = list_entry(tmp, struct v4v_viptables_rule_node, list);
> +
> +          if ((node->src.domain == r->src.domain) &&
> +              (node->src.port   == r->src.port)   &&
> +              (node->dst.domain == r->dst.domain) &&
> +              (node->dst.port   == r->dst.port))
> +            {
> +              position = 0;
> +              break;
> +            }
> +        }
> +      xfree(r);
> +    }
> +  else
> +    {
> +      /* We want to flush the rules! */
> +      printk(KERN_ERR "VIPTables: flushing rules\n");
> +      list_for_each_safe(tmp, next, &viprules)
> +        {
> +          node = list_entry(tmp, struct v4v_viptables_rule_node, list);
> +          list_del(tmp);
> +          xfree(node);
> +        }
> +    }
> +
> +#ifdef V4V_DEBUG
> +  if (position == 0 && tmp != &viprules)
> +    {
> +      printk(KERN_ERR "VIPTables: deleting rule: ");
> +      node = list_entry(tmp, struct v4v_viptables_rule_node, list);
> +      v4v_viptables_print_rule(node);
> +      list_del(tmp);
> +      xfree(node);

This list_del/xfree should definitely not be #ifdef V4V_DEBUG :)

> +    }
> +#endif /* V4V_DEBUG */
> +
> +  return 0;
> +}
> +
> +static size_t
> +v4v_viptables_list (struct domain *src_d, XEN_GUEST_HANDLE(v4v_viptables_list_t) list_hnd)
> +{
> +  struct list_head *ptr;
> +  struct v4v_viptables_rule_node *node;
> +  struct v4v_viptables_list rules_list;
> +  uint32_t nbrules;
> +
> +  memset(&rules_list, 0, sizeof (rules_list));
> +  if (copy_field_from_guest (&rules_list, list_hnd, nb_rules))
> +      return -EFAULT;
> +
> +  ptr = viprules.next;
> +  while (rules_list.nb_rules != 0 && ptr->next != &viprules)
> +  {
> +      ptr = ptr->next;
> +      rules_list.start_rule--;
> +  }
> +
> +  if (rules_list.nb_rules != 0)
> +      return -EFAULT;

Surely s/nb_rules/start_rule/ in both the while() and the if() above?
How much testing has this had?  It seems like this function could never
get as far as copying the rules out.

> +
> +  nbrules = 0;
> +  while (nbrules < rules_list.nb_rules && ptr != &viprules)
> +  {
> +      node = list_entry(ptr, struct v4v_viptables_rule_node, list);
> +
> +      rules_list.rules[rules_list.nb_rules].src = node->src;
> +      rules_list.rules[rules_list.nb_rules].dst = node->dst;
> +      rules_list.rules[rules_list.nb_rules].accept = node->accept;

Aiee!  Good thing it never gets that far, because (a) you're indirecting
a user-supplied distance into a stack array, and (b) the stack array has
zero length.

> +
> +      nbrules++;
> +      ptr = ptr->next;
> +  }
> +
> +  if (copy_to_guest(list_hnd, &rules_list, 1))
> +      return -EFAULT;

And at the end you only copy the header back to the caller.  :|

> +
> +  return 0;
> +}
> +
> +static size_t
> +v4v_viptables_check (v4v_addr_t * src, v4v_addr_t * dst)
> +{
> +  struct list_head *ptr;
> +  struct v4v_viptables_rule_node *node;
> +
> +  list_for_each(ptr, &viprules)
> +    {
> +      node = list_entry(ptr, struct v4v_viptables_rule_node, list);
> +
> +      if ((node->src.domain == DOMID_INVALID || node->src.domain == src->domain) &&

Having defined a new magic V4V_DOMID_* to avoid using DOMID_INVALID, you
should probably use it here.

(oh, and while I'm here please keep lines < 80 characters).

> +          (node->src.port   == -1            || node->src.port   == src->port)   &&

Likewise, why is this not V4V_PORT_NONE? 

> +          (node->dst.domain == DOMID_INVALID || node->dst.domain == dst->domain) &&
> +          (node->dst.port   == -1            || node->dst.port   == dst->port))

And what about protocol?  Protocol seems to have ended up as a bit of a
second-class citizen in v4v; it's defined, and indeed required, but not
used for routing or for acccess control, so all traffic to a given port
_on every protocol_ ends up on the same ring. 

This is the inverse of the TCP/IP namespace that you're copying, where
protocol demux happens before port demux.  And I think it will bite
someone if you ever, for example, want to send ICMP or GRE over a v4v
channel.

> +        return !node->accept;
> +    }
> +
> +  /* Defaulting to ACCEPT */
> +  return 0;
> +}
>  
>  /*Hypercall to do the send*/
>  static size_t
> @@ -1351,6 +1566,15 @@ v4v_send (struct domain *src_d, v4v_addr_t * src_addr,
>          return -ECONNREFUSED;
>      }
>  
> +    if (v4v_viptables_check(src_addr, dst_addr) != 0)
> +    {
> +      read_unlock (&v4v_lock);
> +      printk(KERN_ERR "V4V: VIPTables REJECTED %i:%i -> %i:%i\n",
> +             src_addr->domain, src_addr->port,
> +             dst_addr->domain, dst_addr->port);

This should be at most a gdprintk to stop a badly behaved VM from
spamming the console.

> +      return -ECONNREFUSED;
> +    }
> +
>      do
>      {
>          if ( !dst_d->v4v )
> @@ -1437,6 +1661,15 @@ v4v_sendv (struct domain *src_d, v4v_addr_t * src_addr,
>          return -ECONNREFUSED;
>      }
>  
> +    if (v4v_viptables_check(src_addr, dst_addr) != 0)
> +    {
> +      read_unlock (&v4v_lock);
> +      printk(KERN_ERR "V4V: VIPTables REJECTED %i:%i -> %i:%i\n",
> +             src_addr->domain, src_addr->port,
> +             dst_addr->domain, dst_addr->port);

Likewise.

> +      return -ECONNREFUSED;
> +    }
> +
>      do
>      {
>          if ( !dst_d->v4v )
> @@ -1596,6 +1829,38 @@ do_v4v_op (int cmd, XEN_GUEST_HANDLE (void) arg1,
>                  rc = v4v_notify (d, ring_data_hnd);
>                  break;
>              }
> +        case V4VOP_viptables_add:
> +            {
> +                uint32_t position = arg4;
> +                XEN_GUEST_HANDLE (v4v_viptables_rule_t) rule_hnd =
> +                    guest_handle_cast (arg1, v4v_viptables_rule_t);
> +                rc = -EPERM;
> +                if (!IS_PRIV(d))
> +                    goto out;
> +                rc = v4v_viptables_add (d, rule_hnd, position);
> +                break;
> +            }
> +        case V4VOP_viptables_del:
> +            {
> +                uint32_t position = arg4;
> +                XEN_GUEST_HANDLE (v4v_viptables_rule_t) rule_hnd =
> +                    guest_handle_cast (arg1, v4v_viptables_rule_t);
> +                rc = -EPERM;
> +                if (!IS_PRIV(d))
> +                    goto out;
> +                rc = v4v_viptables_del (d, rule_hnd, position);
> +                break;
> +            }
> +        case V4VOP_viptables_list:
> +            {
> +                XEN_GUEST_HANDLE (v4v_viptables_list_t) rules_list_hnd =
> +                    guest_handle_cast(arg1, v4v_viptables_list_t);
> +                rc = -EPERM;
> +                if (!IS_PRIV(d))
> +                    goto out;
> +                rc = v4v_viptables_list (d, rules_list_hnd);
> +                break;
> +            }
>          default:
>              rc = -ENOSYS;
>              break;
> diff --git a/xen/include/public/v4v.h b/xen/include/public/v4v.h
> index 197770e..d8ca507 100644
> --- a/xen/include/public/v4v.h
> +++ b/xen/include/public/v4v.h
> @@ -213,6 +213,9 @@
>   *               NULL, NULL, nent, 0)
>   */
>  
> +#define V4VOP_viptables_add     6
> +#define V4VOP_viptables_del     7
> +#define V4VOP_viptables_list    8

This will need documentation, and descriptions of the arguments
(assuming the actual definitions don't end up moving back into this
file). 

And they should probably go below the V4VOP_sendv definition.

Cheers,

Tim.

>  #define V4VOP_sendv		5
>  /*
> diff --git a/xen/include/xen/v4v.h b/xen/include/xen/v4v.h
> index 641a6a8..e5b4cb7 100644
> --- a/xen/include/xen/v4v.h
> +++ b/xen/include/xen/v4v.h
> @@ -103,6 +103,32 @@ struct v4v_ring_message_header
>      uint8_t data[0];
>  } V4V_PACKED;
>  
> +typedef struct v4v_viptables_rule
> +{
> +    struct v4v_addr src;
> +    struct v4v_addr dst;
> +    uint32_t accept;
> +} V4V_PACKED v4v_viptables_rule_t;
> +
> +DEFINE_XEN_GUEST_HANDLE (v4v_viptables_rule_t);
> +
> +struct v4v_viptables_rule_node
> +{
> +    struct list_head list;
> +    struct v4v_addr src;
> +    struct v4v_addr dst;
> +    uint32_t accept;
> +} V4V_PACKED;
> +
> +typedef struct v4v_viptables_list
> +{
> +    uint32_t start_rule;
> +    uint32_t nb_rules;
> +    struct v4v_viptables_rule rules[0];
> +} V4V_PACKED v4v_viptables_list_t;
> +
> +DEFINE_XEN_GUEST_HANDLE (v4v_viptables_list_t);
> +
>  /*
>   * Helper functions
>   */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-06-29 10:36       ` Jan Beulich
@ 2012-07-18 20:09         ` Jean Guyader
  2012-07-19  9:34           ` Andrew Cooper
  2012-07-23  8:18           ` Jan Beulich
  0 siblings, 2 replies; 29+ messages in thread
From: Jean Guyader @ 2012-07-18 20:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jean Guyader, xen-devel

On 29 June 2012 11:36, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>> +typedef struct v4v_ring_id
>>>> +{
>>>> +    struct v4v_addr addr;
>>>> +    domid_t partner;
>>>> +} V4V_PACKED v4v_ring_id_t;
>>>> +
>>
>> This structure is really the one that cause trouble. domid_t is 16b
>> and v4v_addr_t is used
>> inside v4v_ring_t. I would like the structure to remind as close as we
>> can from the original version
>> as we already versions in the field. Having explicit padding will make
>> all the structures different
>> which will make much harder to write a driver that will support the
>> two versions of the API.
>
> Oh, I see, "partner" would end up on a different offset if the
> packed attribute was removed from v4v_addr_t. But that
> could still be solved by making this type a union:
>
> typedef union v4v_ring_id
> {
>     struct v4v_addr addr;
>     struct {
>         uint32_t port;
>         domid_t domain;
>         domid_t partner;
>     } full;
> } v4v_ring_id_t;
>
> That would guarantee binary compatibility. And you could even
> achieve source compatibility for gcc users by making the naming
> of the second structure conditional upon __GNUC__ being
> undefined (or adding a second instance of the same, just
> unnamed structure within a respective #ifdef - that would make
> it possible to write code that can be compiled by both gcc and
> non-gcc, yet existing gcc-only code would need changing).
>
>> Also most all the consumer of those headers will have to rewrite the
>> structure anyway, for instance
>> the Linux kernel have it's own naming convention, macros definitions
>> which are different, etc..
>
> Such can usually be done via scripts, so having a fully defined
> public header is still worthwhile.
>

Hi,

I've been working on this and it work for most of it apart from one case.
Let's take this structure:

struct a
{
    uint64_t a;
    uint32_t b;
    uint16_t c;
    uint16_t d;
    uint32_t e;
    uint32_t f;
    uint32_t g;
    uint8_t  h[32];
    uint8_t  q[0];
};

On 32b with and without packing sizeof the struct a is 60 but on 64b the size of
the struct a is 64 without packing and 60 with packing.
However offset of q is 60 is all the case below.

One option would be to replace in the code sizeof "struct q" with
offset of "q" be
I think there is probably something better that could be done.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-18 20:09         ` Jean Guyader
@ 2012-07-19  9:34           ` Andrew Cooper
  2012-07-19  9:58             ` Jean Guyader
  2012-07-23  8:18           ` Jan Beulich
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2012-07-19  9:34 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Jean Guyader (3P), Jan Beulich, xen-devel

On 18/07/12 21:09, Jean Guyader wrote:
> On 29 June 2012 11:36, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>>> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>> +typedef struct v4v_ring_id
>>>>> +{
>>>>> +    struct v4v_addr addr;
>>>>> +    domid_t partner;
>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>> +
>>> This structure is really the one that cause trouble. domid_t is 16b
>>> and v4v_addr_t is used
>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>> can from the original version
>>> as we already versions in the field. Having explicit padding will make
>>> all the structures different
>>> which will make much harder to write a driver that will support the
>>> two versions of the API.
>> Oh, I see, "partner" would end up on a different offset if the
>> packed attribute was removed from v4v_addr_t. But that
>> could still be solved by making this type a union:
>>
>> typedef union v4v_ring_id
>> {
>>     struct v4v_addr addr;
>>     struct {
>>         uint32_t port;
>>         domid_t domain;
>>         domid_t partner;
>>     } full;
>> } v4v_ring_id_t;
>>
>> That would guarantee binary compatibility. And you could even
>> achieve source compatibility for gcc users by making the naming
>> of the second structure conditional upon __GNUC__ being
>> undefined (or adding a second instance of the same, just
>> unnamed structure within a respective #ifdef - that would make
>> it possible to write code that can be compiled by both gcc and
>> non-gcc, yet existing gcc-only code would need changing).
>>
>>> Also most all the consumer of those headers will have to rewrite the
>>> structure anyway, for instance
>>> the Linux kernel have it's own naming convention, macros definitions
>>> which are different, etc..
>> Such can usually be done via scripts, so having a fully defined
>> public header is still worthwhile.
>>
> Hi,
>
> I've been working on this and it work for most of it apart from one case.
> Let's take this structure:
>
> struct a
> {
>     uint64_t a;
>     uint32_t b;
    uint32_t _pad0;
>     uint16_t c;
>     uint16_t d;
>     uint32_t e;
>     uint32_t f;
>     uint32_t g;
>     uint8_t  h[32];
>     uint8_t  q[0];
> };

Manually padding so the alignment is the same on 32 and 64 bit is the
only way to do this in the public headers, which cant have gcc'isms for
compatibility reasons with other compilers.

~Andrew

>
> On 32b with and without packing sizeof the struct a is 60 but on 64b the size of
> the struct a is 64 without packing and 60 with packing.
> However offset of q is 60 is all the case below.
>
> One option would be to replace in the code sizeof "struct q" with
> offset of "q" be
> I think there is probably something better that could be done.
>
> Thanks,
> Jean
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19  9:58             ` Jean Guyader
@ 2012-07-19  9:54               ` Attilio Rao
  2012-07-19 10:06                 ` Jean Guyader
  2012-07-19 10:42               ` Andrew Cooper
  1 sibling, 1 reply; 29+ messages in thread
From: Attilio Rao @ 2012-07-19  9:54 UTC (permalink / raw)
  To: xen-devel

On 19/07/12 10:58, Jean Guyader wrote:
> On 19 July 2012 10:34, Andrew Cooper<andrew.cooper3@citrix.com>  wrote:
>    
>> On 18/07/12 21:09, Jean Guyader wrote:
>>      
>>> On 29 June 2012 11:36, Jan Beulich<JBeulich@suse.com>  wrote:
>>>        
>>>>>>> On 29.06.12 at 12:03, Jean Guyader<jean.guyader@gmail.com>  wrote:
>>>>>>>                
>>>>> On 29 June 2012 09:33, Jan Beulich<JBeulich@suse.com>  wrote:
>>>>>            
>>>>>>>>> On 28.06.12 at 18:26, Jean Guyader<jean.guyader@citrix.com>  wrote:
>>>>>>>>>                    
>>>>>>> +typedef struct v4v_ring_id
>>>>>>> +{
>>>>>>> +    struct v4v_addr addr;
>>>>>>> +    domid_t partner;
>>>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>>>> +
>>>>>>>                
>>>>> This structure is really the one that cause trouble. domid_t is 16b
>>>>> and v4v_addr_t is used
>>>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>>>> can from the original version
>>>>> as we already versions in the field. Having explicit padding will make
>>>>> all the structures different
>>>>> which will make much harder to write a driver that will support the
>>>>> two versions of the API.
>>>>>            
>>>> Oh, I see, "partner" would end up on a different offset if the
>>>> packed attribute was removed from v4v_addr_t. But that
>>>> could still be solved by making this type a union:
>>>>
>>>> typedef union v4v_ring_id
>>>> {
>>>>      struct v4v_addr addr;
>>>>      struct {
>>>>          uint32_t port;
>>>>          domid_t domain;
>>>>          domid_t partner;
>>>>      } full;
>>>> } v4v_ring_id_t;
>>>>
>>>> That would guarantee binary compatibility. And you could even
>>>> achieve source compatibility for gcc users by making the naming
>>>> of the second structure conditional upon __GNUC__ being
>>>> undefined (or adding a second instance of the same, just
>>>> unnamed structure within a respective #ifdef - that would make
>>>> it possible to write code that can be compiled by both gcc and
>>>> non-gcc, yet existing gcc-only code would need changing).
>>>>
>>>>          
>>>>> Also most all the consumer of those headers will have to rewrite the
>>>>> structure anyway, for instance
>>>>> the Linux kernel have it's own naming convention, macros definitions
>>>>> which are different, etc..
>>>>>            
>>>> Such can usually be done via scripts, so having a fully defined
>>>> public header is still worthwhile.
>>>>
>>>>          
>>> Hi,
>>>
>>> I've been working on this and it work for most of it apart from one case.
>>> Let's take this structure:
>>>
>>> struct a
>>> {
>>>      uint64_t a;
>>>      uint32_t b;
>>>        
>>      uint32_t _pad0;
>>      
>>>      uint16_t c;
>>>      uint16_t d;
>>>      uint32_t e;
>>>      uint32_t f;
>>>      uint32_t g;
>>>      uint8_t  h[32];
>>>      uint8_t  q[0];
>>> };
>>>        
>> Manually padding so the alignment is the same on 32 and 64 bit is the
>> only way to do this in the public headers, which cant have gcc'isms for
>> compatibility reasons with other compilers.
>>
>>      
> The problem isn't with the individual fields (they are all correctly
> aligned) it is
> the the overall structure size which is 64 even so offset of q is 60
> (and sizeof q
> should be 0).
>
> I think there is no way around it. The structure I have should be
> aligned on 64b anyway.
>
>    

Can you use gcc __attribute__((aligned(64))) for this? Or we try to 
avoid gcc-ism at all?

Attilio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19  9:34           ` Andrew Cooper
@ 2012-07-19  9:58             ` Jean Guyader
  2012-07-19  9:54               ` Attilio Rao
  2012-07-19 10:42               ` Andrew Cooper
  0 siblings, 2 replies; 29+ messages in thread
From: Jean Guyader @ 2012-07-19  9:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jean Guyader (3P), Jan Beulich, xen-devel

On 19 July 2012 10:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 18/07/12 21:09, Jean Guyader wrote:
>> On 29 June 2012 11:36, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>>>> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>>> +typedef struct v4v_ring_id
>>>>>> +{
>>>>>> +    struct v4v_addr addr;
>>>>>> +    domid_t partner;
>>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>>> +
>>>> This structure is really the one that cause trouble. domid_t is 16b
>>>> and v4v_addr_t is used
>>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>>> can from the original version
>>>> as we already versions in the field. Having explicit padding will make
>>>> all the structures different
>>>> which will make much harder to write a driver that will support the
>>>> two versions of the API.
>>> Oh, I see, "partner" would end up on a different offset if the
>>> packed attribute was removed from v4v_addr_t. But that
>>> could still be solved by making this type a union:
>>>
>>> typedef union v4v_ring_id
>>> {
>>>     struct v4v_addr addr;
>>>     struct {
>>>         uint32_t port;
>>>         domid_t domain;
>>>         domid_t partner;
>>>     } full;
>>> } v4v_ring_id_t;
>>>
>>> That would guarantee binary compatibility. And you could even
>>> achieve source compatibility for gcc users by making the naming
>>> of the second structure conditional upon __GNUC__ being
>>> undefined (or adding a second instance of the same, just
>>> unnamed structure within a respective #ifdef - that would make
>>> it possible to write code that can be compiled by both gcc and
>>> non-gcc, yet existing gcc-only code would need changing).
>>>
>>>> Also most all the consumer of those headers will have to rewrite the
>>>> structure anyway, for instance
>>>> the Linux kernel have it's own naming convention, macros definitions
>>>> which are different, etc..
>>> Such can usually be done via scripts, so having a fully defined
>>> public header is still worthwhile.
>>>
>> Hi,
>>
>> I've been working on this and it work for most of it apart from one case.
>> Let's take this structure:
>>
>> struct a
>> {
>>     uint64_t a;
>>     uint32_t b;
>     uint32_t _pad0;
>>     uint16_t c;
>>     uint16_t d;
>>     uint32_t e;
>>     uint32_t f;
>>     uint32_t g;
>>     uint8_t  h[32];
>>     uint8_t  q[0];
>> };
>
> Manually padding so the alignment is the same on 32 and 64 bit is the
> only way to do this in the public headers, which cant have gcc'isms for
> compatibility reasons with other compilers.
>

The problem isn't with the individual fields (they are all correctly
aligned) it is
the the overall structure size which is 64 even so offset of q is 60
(and sizeof q
should be 0).

I think there is no way around it. The structure I have should be
aligned on 64b anyway.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19 10:06                 ` Jean Guyader
@ 2012-07-19 10:04                   ` Attilio Rao
  2012-07-19 10:32                     ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Attilio Rao @ 2012-07-19 10:04 UTC (permalink / raw)
  To: Jean Guyader; +Cc: xen-devel

On 19/07/12 11:06, Jean Guyader wrote:
> On 19 July 2012 10:54, Attilio Rao<attilio.rao@citrix.com>  wrote:
>    
>> On 19/07/12 10:58, Jean Guyader wrote:
>>      
>>> On 19 July 2012 10:34, Andrew Cooper<andrew.cooper3@citrix.com>   wrote:
>>>
>>>        
>>>> On 18/07/12 21:09, Jean Guyader wrote:
>>>>
>>>>          
>>>>> On 29 June 2012 11:36, Jan Beulich<JBeulich@suse.com>   wrote:
>>>>>
>>>>>            
>>>>>>>>> On 29.06.12 at 12:03, Jean Guyader<jean.guyader@gmail.com>   wrote:
>>>>>>>>>
>>>>>>>>>                    
>>>>>>> On 29 June 2012 09:33, Jan Beulich<JBeulich@suse.com>   wrote:
>>>>>>>
>>>>>>>                
>>>>>>>>>>> On 28.06.12 at 18:26, Jean Guyader<jean.guyader@citrix.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>> +typedef struct v4v_ring_id
>>>>>>>>> +{
>>>>>>>>> +    struct v4v_addr addr;
>>>>>>>>> +    domid_t partner;
>>>>>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>>>>>> +
>>>>>>>>>
>>>>>>>>>                    
>>>>>>> This structure is really the one that cause trouble. domid_t is 16b
>>>>>>> and v4v_addr_t is used
>>>>>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>>>>>> can from the original version
>>>>>>> as we already versions in the field. Having explicit padding will make
>>>>>>> all the structures different
>>>>>>> which will make much harder to write a driver that will support the
>>>>>>> two versions of the API.
>>>>>>>
>>>>>>>                
>>>>>> Oh, I see, "partner" would end up on a different offset if the
>>>>>> packed attribute was removed from v4v_addr_t. But that
>>>>>> could still be solved by making this type a union:
>>>>>>
>>>>>> typedef union v4v_ring_id
>>>>>> {
>>>>>>       struct v4v_addr addr;
>>>>>>       struct {
>>>>>>           uint32_t port;
>>>>>>           domid_t domain;
>>>>>>           domid_t partner;
>>>>>>       } full;
>>>>>> } v4v_ring_id_t;
>>>>>>
>>>>>> That would guarantee binary compatibility. And you could even
>>>>>> achieve source compatibility for gcc users by making the naming
>>>>>> of the second structure conditional upon __GNUC__ being
>>>>>> undefined (or adding a second instance of the same, just
>>>>>> unnamed structure within a respective #ifdef - that would make
>>>>>> it possible to write code that can be compiled by both gcc and
>>>>>> non-gcc, yet existing gcc-only code would need changing).
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Also most all the consumer of those headers will have to rewrite the
>>>>>>> structure anyway, for instance
>>>>>>> the Linux kernel have it's own naming convention, macros definitions
>>>>>>> which are different, etc..
>>>>>>>
>>>>>>>                
>>>>>> Such can usually be done via scripts, so having a fully defined
>>>>>> public header is still worthwhile.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Hi,
>>>>>
>>>>> I've been working on this and it work for most of it apart from one
>>>>> case.
>>>>> Let's take this structure:
>>>>>
>>>>> struct a
>>>>> {
>>>>>       uint64_t a;
>>>>>       uint32_t b;
>>>>>
>>>>>            
>>>>       uint32_t _pad0;
>>>>
>>>>          
>>>>>       uint16_t c;
>>>>>       uint16_t d;
>>>>>       uint32_t e;
>>>>>       uint32_t f;
>>>>>       uint32_t g;
>>>>>       uint8_t  h[32];
>>>>>       uint8_t  q[0];
>>>>> };
>>>>>
>>>>>            
>>>> Manually padding so the alignment is the same on 32 and 64 bit is the
>>>> only way to do this in the public headers, which cant have gcc'isms for
>>>> compatibility reasons with other compilers.
>>>>
>>>>
>>>>          
>>> The problem isn't with the individual fields (they are all correctly
>>> aligned) it is
>>> the the overall structure size which is 64 even so offset of q is 60
>>> (and sizeof q
>>> should be 0).
>>>
>>> I think there is no way around it. The structure I have should be
>>> aligned on 64b anyway.
>>>
>>>
>>>        
>>
>> Can you use gcc __attribute__((aligned(64))) for this? Or we try to avoid
>> gcc-ism at all?
>>
>>      
> I'm trying to avoid any compiler specific statement. I'll added some
> manual padding
> to have it explicitly aligned to 64b.
>
>    
Ideally though you can have an intermediate header where you workout 
compiler specific code (in particular if you are almost sure your code 
will always be likely compiled with gcc).
So, you can have an header which does the following:

#ifdef __GNUC__
#define _compiler_align(x) __attribute__((__aligned__(x)))
#elif __INTEL_COMPILER
#define _compiler_align(x) foo
#else
#define _compiler_align(x)
#endif

Probabily for something like that you want to use some header that is 
known to be included in a lot of files.

Attilio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19  9:54               ` Attilio Rao
@ 2012-07-19 10:06                 ` Jean Guyader
  2012-07-19 10:04                   ` Attilio Rao
  0 siblings, 1 reply; 29+ messages in thread
From: Jean Guyader @ 2012-07-19 10:06 UTC (permalink / raw)
  To: Attilio Rao; +Cc: xen-devel

On 19 July 2012 10:54, Attilio Rao <attilio.rao@citrix.com> wrote:
> On 19/07/12 10:58, Jean Guyader wrote:
>>
>> On 19 July 2012 10:34, Andrew Cooper<andrew.cooper3@citrix.com>  wrote:
>>
>>>
>>> On 18/07/12 21:09, Jean Guyader wrote:
>>>
>>>>
>>>> On 29 June 2012 11:36, Jan Beulich<JBeulich@suse.com>  wrote:
>>>>
>>>>>>>>
>>>>>>>> On 29.06.12 at 12:03, Jean Guyader<jean.guyader@gmail.com>  wrote:
>>>>>>>>
>>>>>>
>>>>>> On 29 June 2012 09:33, Jan Beulich<JBeulich@suse.com>  wrote:
>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 28.06.12 at 18:26, Jean Guyader<jean.guyader@citrix.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>
>>>>>>>> +typedef struct v4v_ring_id
>>>>>>>> +{
>>>>>>>> +    struct v4v_addr addr;
>>>>>>>> +    domid_t partner;
>>>>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>>>>> +
>>>>>>>>
>>>>>>
>>>>>> This structure is really the one that cause trouble. domid_t is 16b
>>>>>> and v4v_addr_t is used
>>>>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>>>>> can from the original version
>>>>>> as we already versions in the field. Having explicit padding will make
>>>>>> all the structures different
>>>>>> which will make much harder to write a driver that will support the
>>>>>> two versions of the API.
>>>>>>
>>>>>
>>>>> Oh, I see, "partner" would end up on a different offset if the
>>>>> packed attribute was removed from v4v_addr_t. But that
>>>>> could still be solved by making this type a union:
>>>>>
>>>>> typedef union v4v_ring_id
>>>>> {
>>>>>      struct v4v_addr addr;
>>>>>      struct {
>>>>>          uint32_t port;
>>>>>          domid_t domain;
>>>>>          domid_t partner;
>>>>>      } full;
>>>>> } v4v_ring_id_t;
>>>>>
>>>>> That would guarantee binary compatibility. And you could even
>>>>> achieve source compatibility for gcc users by making the naming
>>>>> of the second structure conditional upon __GNUC__ being
>>>>> undefined (or adding a second instance of the same, just
>>>>> unnamed structure within a respective #ifdef - that would make
>>>>> it possible to write code that can be compiled by both gcc and
>>>>> non-gcc, yet existing gcc-only code would need changing).
>>>>>
>>>>>
>>>>>>
>>>>>> Also most all the consumer of those headers will have to rewrite the
>>>>>> structure anyway, for instance
>>>>>> the Linux kernel have it's own naming convention, macros definitions
>>>>>> which are different, etc..
>>>>>>
>>>>>
>>>>> Such can usually be done via scripts, so having a fully defined
>>>>> public header is still worthwhile.
>>>>>
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> I've been working on this and it work for most of it apart from one
>>>> case.
>>>> Let's take this structure:
>>>>
>>>> struct a
>>>> {
>>>>      uint64_t a;
>>>>      uint32_t b;
>>>>
>>>
>>>      uint32_t _pad0;
>>>
>>>>
>>>>      uint16_t c;
>>>>      uint16_t d;
>>>>      uint32_t e;
>>>>      uint32_t f;
>>>>      uint32_t g;
>>>>      uint8_t  h[32];
>>>>      uint8_t  q[0];
>>>> };
>>>>
>>>
>>> Manually padding so the alignment is the same on 32 and 64 bit is the
>>> only way to do this in the public headers, which cant have gcc'isms for
>>> compatibility reasons with other compilers.
>>>
>>>
>>
>> The problem isn't with the individual fields (they are all correctly
>> aligned) it is
>> the the overall structure size which is 64 even so offset of q is 60
>> (and sizeof q
>> should be 0).
>>
>> I think there is no way around it. The structure I have should be
>> aligned on 64b anyway.
>>
>>
>
>
> Can you use gcc __attribute__((aligned(64))) for this? Or we try to avoid
> gcc-ism at all?
>

I'm trying to avoid any compiler specific statement. I'll added some
manual padding
to have it explicitly aligned to 64b.

Jean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19 10:04                   ` Attilio Rao
@ 2012-07-19 10:32                     ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2012-07-19 10:32 UTC (permalink / raw)
  To: Attilio Rao; +Cc: Jean Guyader, xen-devel


> Ideally though you can have an intermediate header where you workout 
> compiler specific code (in particular if you are almost sure your code 
> will always be likely compiled with gcc).
> So, you can have an header which does the following:
[...]
> Probabily for something like that you want to use some header that is 
> known to be included in a lot of files.

For better or worse the policy for xen/include/public is that remains
pure ANSI C. So no magic per-compiler headers etc.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19  9:58             ` Jean Guyader
  2012-07-19  9:54               ` Attilio Rao
@ 2012-07-19 10:42               ` Andrew Cooper
  2012-07-19 11:33                 ` Stefano Stabellini
  2012-07-19 11:58                 ` Jean Guyader
  1 sibling, 2 replies; 29+ messages in thread
From: Andrew Cooper @ 2012-07-19 10:42 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Jean Guyader (3P), Jan Beulich, xen-devel


On 19/07/12 10:58, Jean Guyader wrote:
> On 19 July 2012 10:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 18/07/12 21:09, Jean Guyader wrote:
>>> On 29 June 2012 11:36, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>>>>> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>>>> +typedef struct v4v_ring_id
>>>>>>> +{
>>>>>>> +    struct v4v_addr addr;
>>>>>>> +    domid_t partner;
>>>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>>>> +
>>>>> This structure is really the one that cause trouble. domid_t is 16b
>>>>> and v4v_addr_t is used
>>>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>>>> can from the original version
>>>>> as we already versions in the field. Having explicit padding will make
>>>>> all the structures different
>>>>> which will make much harder to write a driver that will support the
>>>>> two versions of the API.
>>>> Oh, I see, "partner" would end up on a different offset if the
>>>> packed attribute was removed from v4v_addr_t. But that
>>>> could still be solved by making this type a union:
>>>>
>>>> typedef union v4v_ring_id
>>>> {
>>>>     struct v4v_addr addr;
>>>>     struct {
>>>>         uint32_t port;
>>>>         domid_t domain;
>>>>         domid_t partner;
>>>>     } full;
>>>> } v4v_ring_id_t;
>>>>
>>>> That would guarantee binary compatibility. And you could even
>>>> achieve source compatibility for gcc users by making the naming
>>>> of the second structure conditional upon __GNUC__ being
>>>> undefined (or adding a second instance of the same, just
>>>> unnamed structure within a respective #ifdef - that would make
>>>> it possible to write code that can be compiled by both gcc and
>>>> non-gcc, yet existing gcc-only code would need changing).
>>>>
>>>>> Also most all the consumer of those headers will have to rewrite the
>>>>> structure anyway, for instance
>>>>> the Linux kernel have it's own naming convention, macros definitions
>>>>> which are different, etc..
>>>> Such can usually be done via scripts, so having a fully defined
>>>> public header is still worthwhile.
>>>>
>>> Hi,
>>>
>>> I've been working on this and it work for most of it apart from one case.
>>> Let's take this structure:
>>>
>>> struct a
>>> {
>>>     uint64_t a;
>>>     uint32_t b;
>>     uint32_t _pad0;
>>>     uint16_t c;
>>>     uint16_t d;
>>>     uint32_t e;
>>>     uint32_t f;
>>>     uint32_t g;
>>>     uint8_t  h[32];
>>>     uint8_t  q[0];
>>> };
>> Manually padding so the alignment is the same on 32 and 64 bit is the
>> only way to do this in the public headers, which cant have gcc'isms for
>> compatibility reasons with other compilers.
>>
> The problem isn't with the individual fields (they are all correctly
> aligned) it is
> the the overall structure size which is 64 even so offset of q is 60
> (and sizeof q
> should be 0).
>
> I think there is no way around it. The structure I have should be
> aligned on 64b anyway.
>
> Thanks,
> Jean

Ah yes - silly me.  I understand your problem now

struct b
{
    uint64_t a;
    uint32_t b;
    uint16_t c;
    uint16_t d;
    uint32_t e;
    uint32_t f;
    uint32_t g;
    uint8_t  h[32];
    union { uint8_t  q[0]; uint32_t _pad; } u;
};

This works for me on gcc and gives identical sizeof and offsetof results
on both 32 and 64bit.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19 10:42               ` Andrew Cooper
@ 2012-07-19 11:33                 ` Stefano Stabellini
  2012-07-19 11:40                   ` Andrew Cooper
  2012-07-19 11:58                 ` Jean Guyader
  1 sibling, 1 reply; 29+ messages in thread
From: Stefano Stabellini @ 2012-07-19 11:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Jean Guyader (3P), Jean Guyader, xen-devel

On Thu, 19 Jul 2012, Andrew Cooper wrote:
> On 19/07/12 10:58, Jean Guyader wrote:
> > On 19 July 2012 10:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> On 18/07/12 21:09, Jean Guyader wrote:
> >>> On 29 June 2012 11:36, Jan Beulich <JBeulich@suse.com> wrote:
> >>>>>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
> >>>>> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
> >>>>>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
> >>>>>>> +typedef struct v4v_ring_id
> >>>>>>> +{
> >>>>>>> +    struct v4v_addr addr;
> >>>>>>> +    domid_t partner;
> >>>>>>> +} V4V_PACKED v4v_ring_id_t;
> >>>>>>> +
> >>>>> This structure is really the one that cause trouble. domid_t is 16b
> >>>>> and v4v_addr_t is used
> >>>>> inside v4v_ring_t. I would like the structure to remind as close as we
> >>>>> can from the original version
> >>>>> as we already versions in the field. Having explicit padding will make
> >>>>> all the structures different
> >>>>> which will make much harder to write a driver that will support the
> >>>>> two versions of the API.
> >>>> Oh, I see, "partner" would end up on a different offset if the
> >>>> packed attribute was removed from v4v_addr_t. But that
> >>>> could still be solved by making this type a union:
> >>>>
> >>>> typedef union v4v_ring_id
> >>>> {
> >>>>     struct v4v_addr addr;
> >>>>     struct {
> >>>>         uint32_t port;
> >>>>         domid_t domain;
> >>>>         domid_t partner;
> >>>>     } full;
> >>>> } v4v_ring_id_t;
> >>>>
> >>>> That would guarantee binary compatibility. And you could even
> >>>> achieve source compatibility for gcc users by making the naming
> >>>> of the second structure conditional upon __GNUC__ being
> >>>> undefined (or adding a second instance of the same, just
> >>>> unnamed structure within a respective #ifdef - that would make
> >>>> it possible to write code that can be compiled by both gcc and
> >>>> non-gcc, yet existing gcc-only code would need changing).
> >>>>
> >>>>> Also most all the consumer of those headers will have to rewrite the
> >>>>> structure anyway, for instance
> >>>>> the Linux kernel have it's own naming convention, macros definitions
> >>>>> which are different, etc..
> >>>> Such can usually be done via scripts, so having a fully defined
> >>>> public header is still worthwhile.
> >>>>
> >>> Hi,
> >>>
> >>> I've been working on this and it work for most of it apart from one case.
> >>> Let's take this structure:
> >>>
> >>> struct a
> >>> {
> >>>     uint64_t a;
> >>>     uint32_t b;
> >>     uint32_t _pad0;
> >>>     uint16_t c;
> >>>     uint16_t d;
> >>>     uint32_t e;
> >>>     uint32_t f;
> >>>     uint32_t g;
> >>>     uint8_t  h[32];
> >>>     uint8_t  q[0];
> >>> };
> >> Manually padding so the alignment is the same on 32 and 64 bit is the
> >> only way to do this in the public headers, which cant have gcc'isms for
> >> compatibility reasons with other compilers.
> >>
> > The problem isn't with the individual fields (they are all correctly
> > aligned) it is
> > the the overall structure size which is 64 even so offset of q is 60
> > (and sizeof q
> > should be 0).
> >
> > I think there is no way around it. The structure I have should be
> > aligned on 64b anyway.
> >
> > Thanks,
> > Jean
> 
> Ah yes - silly me.  I understand your problem now
> 
> struct b
> {
>     uint64_t a;
>     uint32_t b;
>     uint16_t c;
>     uint16_t d;
>     uint32_t e;
>     uint32_t f;
>     uint32_t g;
>     uint8_t  h[32];
>     union { uint8_t  q[0]; uint32_t _pad; } u;
> };
> 
> This works for me on gcc and gives identical sizeof and offsetof results
> on both 32 and 64bit.

Yes, but then the access to q becomes b.u.q unless we use anonymous
unions that are only available in C11.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19 11:33                 ` Stefano Stabellini
@ 2012-07-19 11:40                   ` Andrew Cooper
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2012-07-19 11:40 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Jan Beulich, Jean Guyader (3P), Jean Guyader, xen-devel

On 19/07/12 12:33, Stefano Stabellini wrote:
>>> The problem isn't with the individual fields (they are all correctly
>>> aligned) it is
>>> the the overall structure size which is 64 even so offset of q is 60
>>> (and sizeof q
>>> should be 0).
>>>
>>> I think there is no way around it. The structure I have should be
>>> aligned on 64b anyway.
>>>
>>> Thanks,
>>> Jean
>> Ah yes - silly me.  I understand your problem now
>>
>> struct b
>> {
>>     uint64_t a;
>>     uint32_t b;
>>     uint16_t c;
>>     uint16_t d;
>>     uint32_t e;
>>     uint32_t f;
>>     uint32_t g;
>>     uint8_t  h[32];
>>     union { uint8_t  q[0]; uint32_t _pad; } u;
>> };
>>
>> This works for me on gcc and gives identical sizeof and offsetof results
>> on both 32 and 64bit.
> Yes, but then the access to q becomes b.u.q unless we use anonymous
> unions that are only available in C11.

Combine it with the Linux style of

#define q u.q

although with rather more sensible names than q and u, or just use a
macro for all accesses to q from b.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-19 10:42               ` Andrew Cooper
  2012-07-19 11:33                 ` Stefano Stabellini
@ 2012-07-19 11:58                 ` Jean Guyader
  1 sibling, 0 replies; 29+ messages in thread
From: Jean Guyader @ 2012-07-19 11:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jean Guyader (3P), Jan Beulich, xen-devel

On 19 July 2012 11:42, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 19/07/12 10:58, Jean Guyader wrote:
>> On 19 July 2012 10:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 18/07/12 21:09, Jean Guyader wrote:
>>>> On 29 June 2012 11:36, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>>>>>> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>>>>> +typedef struct v4v_ring_id
>>>>>>>> +{
>>>>>>>> +    struct v4v_addr addr;
>>>>>>>> +    domid_t partner;
>>>>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>>>>> +
>>>>>> This structure is really the one that cause trouble. domid_t is 16b
>>>>>> and v4v_addr_t is used
>>>>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>>>>> can from the original version
>>>>>> as we already versions in the field. Having explicit padding will make
>>>>>> all the structures different
>>>>>> which will make much harder to write a driver that will support the
>>>>>> two versions of the API.
>>>>> Oh, I see, "partner" would end up on a different offset if the
>>>>> packed attribute was removed from v4v_addr_t. But that
>>>>> could still be solved by making this type a union:
>>>>>
>>>>> typedef union v4v_ring_id
>>>>> {
>>>>>     struct v4v_addr addr;
>>>>>     struct {
>>>>>         uint32_t port;
>>>>>         domid_t domain;
>>>>>         domid_t partner;
>>>>>     } full;
>>>>> } v4v_ring_id_t;
>>>>>
>>>>> That would guarantee binary compatibility. And you could even
>>>>> achieve source compatibility for gcc users by making the naming
>>>>> of the second structure conditional upon __GNUC__ being
>>>>> undefined (or adding a second instance of the same, just
>>>>> unnamed structure within a respective #ifdef - that would make
>>>>> it possible to write code that can be compiled by both gcc and
>>>>> non-gcc, yet existing gcc-only code would need changing).
>>>>>
>>>>>> Also most all the consumer of those headers will have to rewrite the
>>>>>> structure anyway, for instance
>>>>>> the Linux kernel have it's own naming convention, macros definitions
>>>>>> which are different, etc..
>>>>> Such can usually be done via scripts, so having a fully defined
>>>>> public header is still worthwhile.
>>>>>
>>>> Hi,
>>>>
>>>> I've been working on this and it work for most of it apart from one case.
>>>> Let's take this structure:
>>>>
>>>> struct a
>>>> {
>>>>     uint64_t a;
>>>>     uint32_t b;
>>>     uint32_t _pad0;
>>>>     uint16_t c;
>>>>     uint16_t d;
>>>>     uint32_t e;
>>>>     uint32_t f;
>>>>     uint32_t g;
>>>>     uint8_t  h[32];
>>>>     uint8_t  q[0];
>>>> };
>>> Manually padding so the alignment is the same on 32 and 64 bit is the
>>> only way to do this in the public headers, which cant have gcc'isms for
>>> compatibility reasons with other compilers.
>>>
>> The problem isn't with the individual fields (they are all correctly
>> aligned) it is
>> the the overall structure size which is 64 even so offset of q is 60
>> (and sizeof q
>> should be 0).
>>
>> I think there is no way around it. The structure I have should be
>> aligned on 64b anyway.
>>
>> Thanks,
>> Jean
>
> Ah yes - silly me.  I understand your problem now
>
> struct b
> {
>     uint64_t a;
>     uint32_t b;
>     uint16_t c;
>     uint16_t d;
>     uint32_t e;
>     uint32_t f;
>     uint32_t g;
>     uint8_t  h[32];
>     union { uint8_t  q[0]; uint32_t _pad; } u;
> };
>

The code assumes that sizeof q is offset of q so that won't really work.
I'll add some padding in h and I'll loose the binary compatibility I
think I have no
choice.

Thanks for you help though.
Jean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] xen: Add V4V implementation
  2012-07-18 20:09         ` Jean Guyader
  2012-07-19  9:34           ` Andrew Cooper
@ 2012-07-23  8:18           ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2012-07-23  8:18 UTC (permalink / raw)
  To: Jean Guyader; +Cc: Jean Guyader, xen-devel

>>> On 18.07.12 at 22:09, Jean Guyader <jean.guyader@gmail.com> wrote:
> On 29 June 2012 11:36, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 29.06.12 at 12:03, Jean Guyader <jean.guyader@gmail.com> wrote:
>>> On 29 June 2012 09:33, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 28.06.12 at 18:26, Jean Guyader <jean.guyader@citrix.com> wrote:
>>>>> +typedef struct v4v_ring_id
>>>>> +{
>>>>> +    struct v4v_addr addr;
>>>>> +    domid_t partner;
>>>>> +} V4V_PACKED v4v_ring_id_t;
>>>>> +
>>>
>>> This structure is really the one that cause trouble. domid_t is 16b
>>> and v4v_addr_t is used
>>> inside v4v_ring_t. I would like the structure to remind as close as we
>>> can from the original version
>>> as we already versions in the field. Having explicit padding will make
>>> all the structures different
>>> which will make much harder to write a driver that will support the
>>> two versions of the API.
>>
>> Oh, I see, "partner" would end up on a different offset if the
>> packed attribute was removed from v4v_addr_t. But that
>> could still be solved by making this type a union:
>>
>> typedef union v4v_ring_id
>> {
>>     struct v4v_addr addr;
>>     struct {
>>         uint32_t port;
>>         domid_t domain;
>>         domid_t partner;
>>     } full;
>> } v4v_ring_id_t;
>>
>> That would guarantee binary compatibility. And you could even
>> achieve source compatibility for gcc users by making the naming
>> of the second structure conditional upon __GNUC__ being
>> undefined (or adding a second instance of the same, just
>> unnamed structure within a respective #ifdef - that would make
>> it possible to write code that can be compiled by both gcc and
>> non-gcc, yet existing gcc-only code would need changing).
>>
>>> Also most all the consumer of those headers will have to rewrite the
>>> structure anyway, for instance
>>> the Linux kernel have it's own naming convention, macros definitions
>>> which are different, etc..
>>
>> Such can usually be done via scripts, so having a fully defined
>> public header is still worthwhile.
>>
> 
> Hi,
> 
> I've been working on this and it work for most of it apart from one case.
> Let's take this structure:
> 
> struct a
> {
>     uint64_t a;
>     uint32_t b;
>     uint16_t c;
>     uint16_t d;
>     uint32_t e;
>     uint32_t f;
>     uint32_t g;
>     uint8_t  h[32];
>     uint8_t  q[0];
> };
> 
> On 32b with and without packing sizeof the struct a is 60 but on 64b the 
> size of
> the struct a is 64 without packing and 60 with packing.
> However offset of q is 60 is all the case below.
> 
> One option would be to replace in the code sizeof "struct q" with
> offset of "q" be
> I think there is probably something better that could be done.

Not sure what you're trying to tell me here, but I'd like to note
that sizeof() is mostly meaningless on a structure with a trailing
[0] member anyway, so all you're after is that all offsetof()-s
are identical, which you say they are.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-07-23  8:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-28 16:26 [PATCH 0/5] RFC: V4V (v2) Jean Guyader
2012-06-28 16:26 ` [PATCH 1/5] xen: add ssize_t Jean Guyader
2012-06-29  8:05   ` Jan Beulich
2012-06-29 10:09     ` Jean Guyader
2012-06-29 10:38       ` Jan Beulich
2012-06-28 16:26 ` [PATCH 2/5] v4v: Introduce VIRQ_V4V Jean Guyader
2012-06-29  8:07   ` Jan Beulich
2012-06-29 10:33     ` Jean Guyader
2012-06-28 16:26 ` [PATCH 3/5] xen: Enforce introduce guest_handle_for_field Jean Guyader
2012-06-29  8:10   ` Jan Beulich
2012-06-28 16:26 ` [PATCH 4/5] xen: Add V4V implementation Jean Guyader
2012-06-29  8:33   ` Jan Beulich
2012-06-29 10:03     ` Jean Guyader
2012-06-29 10:36       ` Jan Beulich
2012-07-18 20:09         ` Jean Guyader
2012-07-19  9:34           ` Andrew Cooper
2012-07-19  9:58             ` Jean Guyader
2012-07-19  9:54               ` Attilio Rao
2012-07-19 10:06                 ` Jean Guyader
2012-07-19 10:04                   ` Attilio Rao
2012-07-19 10:32                     ` Ian Campbell
2012-07-19 10:42               ` Andrew Cooper
2012-07-19 11:33                 ` Stefano Stabellini
2012-07-19 11:40                   ` Andrew Cooper
2012-07-19 11:58                 ` Jean Guyader
2012-07-23  8:18           ` Jan Beulich
2012-07-05 11:36   ` Tim Deegan
2012-06-28 16:26 ` [PATCH 5/5] v4v: Introduce basic access control to V4V Jean Guyader
2012-07-05 14:23   ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.