All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] introduce the Xen PV Calls backend
@ 2017-05-15 20:35 Stefano Stabellini
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
  0 siblings, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel; +Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky

Hi all,

this series introduces the backend for the newly introduced PV Calls
procotol.

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

For more information about PV Calls, please read:

https://xenbits.xen.org/docs/unstable/misc/pvcalls.html

I tried to split the source code into small pieces to make it easier to
read and understand. Please review!


Stefano Stabellini (18):
      xen: introduce the pvcalls interface header
      xen/pvcalls: introduce the pvcalls xenbus backend
      xen/pvcalls: initialize the module and register the xenbus backend
      xen/pvcalls: xenbus state handling
      xen/pvcalls: connect to a frontend
      xen/pvcalls: handle commands from the frontend
      xen/pvcalls: implement socket command
      xen/pvcalls: implement connect command
      xen/pvcalls: implement bind command
      xen/pvcalls: implement listen command
      xen/pvcalls: implement accept command
      xen/pvcalls: implement poll command
      xen/pvcalls: implement release command
      xen/pvcalls: disconnect and module_exit
      xen/pvcalls: introduce the ioworker
      xen/pvcalls: implement read
      xen/pvcalls: implement write
      xen: introduce a Kconfig option to enable the pvcalls backend

 drivers/xen/Kconfig                |   12 +
 drivers/xen/Makefile               |    1 +
 drivers/xen/pvcalls-back.c         | 1317 ++++++++++++++++++++++++++++++++++++
 include/xen/interface/io/pvcalls.h |  117 ++++
 4 files changed, 1447 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c
 create mode 100644 include/xen/interface/io/pvcalls.h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 01/18] xen: introduce the pvcalls interface header
  2017-05-15 20:35 [PATCH 00/18] introduce the Xen PV Calls backend Stefano Stabellini
@ 2017-05-15 20:35 ` Stefano Stabellini
  2017-05-15 20:35   ` [PATCH 02/18] xen/pvcalls: introduce the pvcalls xenbus backend Stefano Stabellini
                     ` (22 more replies)
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
  1 sibling, 23 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky,
	Stefano Stabellini, konrad.wilk

Introduce the C header file which defines the PV Calls interface. It is
imported from xen/include/public/io/pvcalls.h.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: konrad.wilk@oracle.com
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 include/xen/interface/io/pvcalls.h | 117 +++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 include/xen/interface/io/pvcalls.h

diff --git a/include/xen/interface/io/pvcalls.h b/include/xen/interface/io/pvcalls.h
new file mode 100644
index 0000000..c438c1b
--- /dev/null
+++ b/include/xen/interface/io/pvcalls.h
@@ -0,0 +1,117 @@
+#ifndef __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+#define __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+
+#include <linux/net.h>
+#include "xen/interface/io/ring.h"
+
+/*
+ * See docs/misc/pvcalls.markdown in xen.git for the full specification:
+ * https://xenbits.xen.org/docs/unstable/misc/pvcalls.html
+ */
+struct pvcalls_data_intf {
+    RING_IDX in_cons, in_prod, in_error;
+
+    uint8_t pad1[52];
+
+    RING_IDX out_cons, out_prod, out_error;
+
+    uint8_t pad2[52];
+
+    RING_IDX ring_order;
+    grant_ref_t ref[];
+};
+DEFINE_XEN_FLEX_RING(pvcalls);
+
+#define PVCALLS_SOCKET         0
+#define PVCALLS_CONNECT        1
+#define PVCALLS_RELEASE        2
+#define PVCALLS_BIND           3
+#define PVCALLS_LISTEN         4
+#define PVCALLS_ACCEPT         5
+#define PVCALLS_POLL           6
+
+struct xen_pvcalls_request {
+    uint32_t req_id; /* private to guest, echoed in response */
+    uint32_t cmd;    /* command to execute */
+    union {
+        struct xen_pvcalls_socket {
+            uint64_t id;
+            uint32_t domain;
+            uint32_t type;
+            uint32_t protocol;
+        } socket;
+        struct xen_pvcalls_connect {
+            uint64_t id;
+            uint8_t addr[28];
+            uint32_t len;
+            uint32_t flags;
+            grant_ref_t ref;
+            uint32_t evtchn;
+        } connect;
+        struct xen_pvcalls_release {
+            uint64_t id;
+            uint8_t reuse;
+        } release;
+        struct xen_pvcalls_bind {
+            uint64_t id;
+            uint8_t addr[28];
+            uint32_t len;
+        } bind;
+        struct xen_pvcalls_listen {
+            uint64_t id;
+            uint32_t backlog;
+        } listen;
+        struct xen_pvcalls_accept {
+            uint64_t id;
+            uint64_t id_new;
+            grant_ref_t ref;
+            uint32_t evtchn;
+        } accept;
+        struct xen_pvcalls_poll {
+            uint64_t id;
+        } poll;
+        /* dummy member to force sizeof(struct xen_pvcalls_request)
+         * to match across archs */
+        struct xen_pvcalls_dummy {
+            uint8_t dummy[56];
+        } dummy;
+    } u;
+};
+
+struct xen_pvcalls_response {
+    uint32_t req_id;
+    uint32_t cmd;
+    int32_t ret;
+    uint32_t pad;
+    union {
+        struct _xen_pvcalls_socket {
+            uint64_t id;
+        } socket;
+        struct _xen_pvcalls_connect {
+            uint64_t id;
+        } connect;
+        struct _xen_pvcalls_release {
+            uint64_t id;
+        } release;
+        struct _xen_pvcalls_bind {
+            uint64_t id;
+        } bind;
+        struct _xen_pvcalls_listen {
+            uint64_t id;
+        } listen;
+        struct _xen_pvcalls_accept {
+            uint64_t id;
+        } accept;
+        struct _xen_pvcalls_poll {
+            uint64_t id;
+        } poll;
+        struct _xen_pvcalls_dummy {
+            uint8_t dummy[8];
+        } dummy;
+    } u;
+};
+
+DEFINE_RING_TYPES(xen_pvcalls, struct xen_pvcalls_request,
+                  struct xen_pvcalls_response);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 01/18] xen: introduce the pvcalls interface header
  2017-05-15 20:35 [PATCH 00/18] introduce the Xen PV Calls backend Stefano Stabellini
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:35 ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, sstabellini, linux-kernel, Stefano Stabellini, boris.ostrovsky

Introduce the C header file which defines the PV Calls interface. It is
imported from xen/include/public/io/pvcalls.h.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: konrad.wilk@oracle.com
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 include/xen/interface/io/pvcalls.h | 117 +++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 include/xen/interface/io/pvcalls.h

diff --git a/include/xen/interface/io/pvcalls.h b/include/xen/interface/io/pvcalls.h
new file mode 100644
index 0000000..c438c1b
--- /dev/null
+++ b/include/xen/interface/io/pvcalls.h
@@ -0,0 +1,117 @@
+#ifndef __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+#define __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+
+#include <linux/net.h>
+#include "xen/interface/io/ring.h"
+
+/*
+ * See docs/misc/pvcalls.markdown in xen.git for the full specification:
+ * https://xenbits.xen.org/docs/unstable/misc/pvcalls.html
+ */
+struct pvcalls_data_intf {
+    RING_IDX in_cons, in_prod, in_error;
+
+    uint8_t pad1[52];
+
+    RING_IDX out_cons, out_prod, out_error;
+
+    uint8_t pad2[52];
+
+    RING_IDX ring_order;
+    grant_ref_t ref[];
+};
+DEFINE_XEN_FLEX_RING(pvcalls);
+
+#define PVCALLS_SOCKET         0
+#define PVCALLS_CONNECT        1
+#define PVCALLS_RELEASE        2
+#define PVCALLS_BIND           3
+#define PVCALLS_LISTEN         4
+#define PVCALLS_ACCEPT         5
+#define PVCALLS_POLL           6
+
+struct xen_pvcalls_request {
+    uint32_t req_id; /* private to guest, echoed in response */
+    uint32_t cmd;    /* command to execute */
+    union {
+        struct xen_pvcalls_socket {
+            uint64_t id;
+            uint32_t domain;
+            uint32_t type;
+            uint32_t protocol;
+        } socket;
+        struct xen_pvcalls_connect {
+            uint64_t id;
+            uint8_t addr[28];
+            uint32_t len;
+            uint32_t flags;
+            grant_ref_t ref;
+            uint32_t evtchn;
+        } connect;
+        struct xen_pvcalls_release {
+            uint64_t id;
+            uint8_t reuse;
+        } release;
+        struct xen_pvcalls_bind {
+            uint64_t id;
+            uint8_t addr[28];
+            uint32_t len;
+        } bind;
+        struct xen_pvcalls_listen {
+            uint64_t id;
+            uint32_t backlog;
+        } listen;
+        struct xen_pvcalls_accept {
+            uint64_t id;
+            uint64_t id_new;
+            grant_ref_t ref;
+            uint32_t evtchn;
+        } accept;
+        struct xen_pvcalls_poll {
+            uint64_t id;
+        } poll;
+        /* dummy member to force sizeof(struct xen_pvcalls_request)
+         * to match across archs */
+        struct xen_pvcalls_dummy {
+            uint8_t dummy[56];
+        } dummy;
+    } u;
+};
+
+struct xen_pvcalls_response {
+    uint32_t req_id;
+    uint32_t cmd;
+    int32_t ret;
+    uint32_t pad;
+    union {
+        struct _xen_pvcalls_socket {
+            uint64_t id;
+        } socket;
+        struct _xen_pvcalls_connect {
+            uint64_t id;
+        } connect;
+        struct _xen_pvcalls_release {
+            uint64_t id;
+        } release;
+        struct _xen_pvcalls_bind {
+            uint64_t id;
+        } bind;
+        struct _xen_pvcalls_listen {
+            uint64_t id;
+        } listen;
+        struct _xen_pvcalls_accept {
+            uint64_t id;
+        } accept;
+        struct _xen_pvcalls_poll {
+            uint64_t id;
+        } poll;
+        struct _xen_pvcalls_dummy {
+            uint8_t dummy[8];
+        } dummy;
+    } u;
+};
+
+DEFINE_RING_TYPES(xen_pvcalls, struct xen_pvcalls_request,
+                  struct xen_pvcalls_response);
+
+#endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 02/18] xen/pvcalls: introduce the pvcalls xenbus backend
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:35   ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                     ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Introduce a xenbus backend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
new file mode 100644
index 0000000..2dbf7d8
--- /dev/null
+++ b/drivers/xen/pvcalls-back.c
@@ -0,0 +1,61 @@
+/*
+ * (c) 2017 Stefano Stabellini <stefano@aporeto.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kthread.h>
+#include <linux/list.h>
+#include <linux/radix-tree.h>
+#include <linux/module.h>
+#include <linux/rwsem.h>
+#include <linux/wait.h>
+
+#include <xen/events.h>
+#include <xen/grant_table.h>
+#include <xen/xen.h>
+#include <xen/xenbus.h>
+#include <xen/interface/io/pvcalls.h>
+
+static int pvcalls_back_probe(struct xenbus_device *dev,
+			      const struct xenbus_device_id *id)
+{
+	return 0;
+}
+
+static void pvcalls_back_changed(struct xenbus_device *dev,
+				 enum xenbus_state frontend_state)
+{
+}
+
+static int pvcalls_back_remove(struct xenbus_device *dev)
+{
+	return 0;
+}
+
+static int pvcalls_back_uevent(struct xenbus_device *xdev,
+			       struct kobj_uevent_env *env)
+{
+	return 0;
+}
+
+static const struct xenbus_device_id pvcalls_back_ids[] = {
+	{ "pvcalls" },
+	{ "" }
+};
+
+static struct xenbus_driver pvcalls_back_driver = {
+	.ids = pvcalls_back_ids,
+	.probe = pvcalls_back_probe,
+	.remove = pvcalls_back_remove,
+	.uevent = pvcalls_back_uevent,
+	.otherend_changed = pvcalls_back_changed,
+};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 02/18] xen/pvcalls: introduce the pvcalls xenbus backend
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
  2017-05-15 20:35   ` [PATCH 02/18] xen/pvcalls: introduce the pvcalls xenbus backend Stefano Stabellini
@ 2017-05-15 20:35   ` Stefano Stabellini
  2017-05-15 20:35     ` Stefano Stabellini
                     ` (20 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Introduce a xenbus backend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
new file mode 100644
index 0000000..2dbf7d8
--- /dev/null
+++ b/drivers/xen/pvcalls-back.c
@@ -0,0 +1,61 @@
+/*
+ * (c) 2017 Stefano Stabellini <stefano@aporeto.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kthread.h>
+#include <linux/list.h>
+#include <linux/radix-tree.h>
+#include <linux/module.h>
+#include <linux/rwsem.h>
+#include <linux/wait.h>
+
+#include <xen/events.h>
+#include <xen/grant_table.h>
+#include <xen/xen.h>
+#include <xen/xenbus.h>
+#include <xen/interface/io/pvcalls.h>
+
+static int pvcalls_back_probe(struct xenbus_device *dev,
+			      const struct xenbus_device_id *id)
+{
+	return 0;
+}
+
+static void pvcalls_back_changed(struct xenbus_device *dev,
+				 enum xenbus_state frontend_state)
+{
+}
+
+static int pvcalls_back_remove(struct xenbus_device *dev)
+{
+	return 0;
+}
+
+static int pvcalls_back_uevent(struct xenbus_device *xdev,
+			       struct kobj_uevent_env *env)
+{
+	return 0;
+}
+
+static const struct xenbus_device_id pvcalls_back_ids[] = {
+	{ "pvcalls" },
+	{ "" }
+};
+
+static struct xenbus_driver pvcalls_back_driver = {
+	.ids = pvcalls_back_ids,
+	.probe = pvcalls_back_probe,
+	.remove = pvcalls_back_remove,
+	.uevent = pvcalls_back_uevent,
+	.otherend_changed = pvcalls_back_changed,
+};
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:35     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

The pvcalls backend has one ioworker per cpu: the ioworkers are
implemented as a cpu bound workqueue, and will deal with the actual
socket and data ring reads/writes.

ioworkers are global: we only have one set for all the frontends. They
process requests on their wqs list in order, once they are done with a
request, they'll remove it from the list. A spinlock is used for
protecting the list. Each ioworker is bound to a different cpu to
maximize throughput.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2dbf7d8..46a889a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,26 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+struct pvcalls_ioworker {
+	struct work_struct register_work;
+	atomic_t io;
+	struct list_head wqs;
+	spinlock_t lock;
+	int num;
+};
+
+struct pvcalls_back_global {
+	struct pvcalls_ioworker *ioworkers;
+	int nr_ioworkers;
+	struct workqueue_struct *wq;
+	struct list_head privs;
+	struct rw_semaphore privs_lock;
+} pvcalls_back_global;
+
+static void pvcalls_back_ioworker(struct work_struct *work)
+{
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
 			      const struct xenbus_device_id *id)
 {
@@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
 	.uevent = pvcalls_back_uevent,
 	.otherend_changed = pvcalls_back_changed,
 };
+
+static int __init pvcalls_back_init(void)
+{
+	int ret, i, cpu;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	ret = xenbus_register_backend(&pvcalls_back_driver);
+	if (ret < 0)
+		return ret;
+
+	init_rwsem(&pvcalls_back_global.privs_lock);
+	INIT_LIST_HEAD(&pvcalls_back_global.privs);
+	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
+	if (!pvcalls_back_global.wq)
+		goto error;
+	pvcalls_back_global.nr_ioworkers = num_online_cpus();
+	pvcalls_back_global.ioworkers = kzalloc(
+		sizeof(*pvcalls_back_global.ioworkers) *
+		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
+	if (!pvcalls_back_global.ioworkers)
+		goto error;
+	i = 0;
+	for_each_online_cpu(cpu) {
+		pvcalls_back_global.ioworkers[i].num = i;
+		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
+		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
+		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
+		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
+			pvcalls_back_ioworker);
+		i++;
+	}
+	return 0;
+
+error:
+	if (pvcalls_back_global.wq)
+		destroy_workqueue(pvcalls_back_global.wq);
+	xenbus_unregister_driver(&pvcalls_back_driver);
+	kfree(pvcalls_back_global.ioworkers);
+	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
+	return -ENOMEM;
+}
+module_init(pvcalls_back_init);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
@ 2017-05-15 20:35     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

The pvcalls backend has one ioworker per cpu: the ioworkers are
implemented as a cpu bound workqueue, and will deal with the actual
socket and data ring reads/writes.

ioworkers are global: we only have one set for all the frontends. They
process requests on their wqs list in order, once they are done with a
request, they'll remove it from the list. A spinlock is used for
protecting the list. Each ioworker is bound to a different cpu to
maximize throughput.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2dbf7d8..46a889a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,26 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+struct pvcalls_ioworker {
+	struct work_struct register_work;
+	atomic_t io;
+	struct list_head wqs;
+	spinlock_t lock;
+	int num;
+};
+
+struct pvcalls_back_global {
+	struct pvcalls_ioworker *ioworkers;
+	int nr_ioworkers;
+	struct workqueue_struct *wq;
+	struct list_head privs;
+	struct rw_semaphore privs_lock;
+} pvcalls_back_global;
+
+static void pvcalls_back_ioworker(struct work_struct *work)
+{
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
 			      const struct xenbus_device_id *id)
 {
@@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
 	.uevent = pvcalls_back_uevent,
 	.otherend_changed = pvcalls_back_changed,
 };
+
+static int __init pvcalls_back_init(void)
+{
+	int ret, i, cpu;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	ret = xenbus_register_backend(&pvcalls_back_driver);
+	if (ret < 0)
+		return ret;
+
+	init_rwsem(&pvcalls_back_global.privs_lock);
+	INIT_LIST_HEAD(&pvcalls_back_global.privs);
+	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
+	if (!pvcalls_back_global.wq)
+		goto error;
+	pvcalls_back_global.nr_ioworkers = num_online_cpus();
+	pvcalls_back_global.ioworkers = kzalloc(
+		sizeof(*pvcalls_back_global.ioworkers) *
+		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
+	if (!pvcalls_back_global.ioworkers)
+		goto error;
+	i = 0;
+	for_each_online_cpu(cpu) {
+		pvcalls_back_global.ioworkers[i].num = i;
+		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
+		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
+		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
+		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
+			pvcalls_back_ioworker);
+		i++;
+	}
+	return 0;
+
+error:
+	if (pvcalls_back_global.wq)
+		destroy_workqueue(pvcalls_back_global.wq);
+	xenbus_unregister_driver(&pvcalls_back_driver);
+	kfree(pvcalls_back_global.ioworkers);
+	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
+	return -ENOMEM;
+}
+module_init(pvcalls_back_init);
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 04/18] xen/pvcalls: xenbus state handling
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:35     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Introduce the code to handle xenbus state changes.

Implement the probe function for the pvcalls backend. Write the
supported versions, max-page-order and function-calls nodes to xenstore,
as required by the protocol.

Introduce stub functions for disconnecting/connecting to a frontend.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 133 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 46a889a..86eca19 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,9 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+#define PVCALLS_VERSIONS "1"
+#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+
 struct pvcalls_ioworker {
 	struct work_struct register_work;
 	atomic_t io;
@@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static int backend_connect(struct xenbus_device *dev)
+{
+	return 0;
+}
+
+static int backend_disconnect(struct xenbus_device *dev)
+{
+	return 0;
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
 			      const struct xenbus_device_id *id)
 {
+	int err;
+
+	err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
+			    PVCALLS_VERSIONS);
+	if (err) {
+		pr_warn("%s write out 'version' failed\n", __func__);
+		return -EINVAL;
+	}
+
+	err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
+			    MAX_RING_ORDER);
+	if (err) {
+		pr_warn("%s write out 'max-page-order' failed\n", __func__);
+		return -EINVAL;
+	}
+
+	/* "1" means socket, connect, release, bind, listen, accept and poll*/
+	err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");
+	if (err) {
+		pr_warn("%s write out 'function-calls' failed\n", __func__);
+		return -EINVAL;
+	}
+
+	err = xenbus_switch_state(dev, XenbusStateInitWait);
+	if (err)
+		return err;
+
 	return 0;
 }
 
+static void set_backend_state(struct xenbus_device *dev,
+			      enum xenbus_state state)
+{
+	while (dev->state != state) {
+		switch (dev->state) {
+		case XenbusStateClosed:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+				xenbus_switch_state(dev, XenbusStateInitWait);
+				break;
+			case XenbusStateClosing:
+				xenbus_switch_state(dev, XenbusStateClosing);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		case XenbusStateInitWait:
+		case XenbusStateInitialised:
+			switch (state) {
+			case XenbusStateConnected:
+				backend_connect(dev);
+				xenbus_switch_state(dev, XenbusStateConnected);
+				break;
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				xenbus_switch_state(dev, XenbusStateClosing);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		case XenbusStateConnected:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				down_write(&pvcalls_back_global.privs_lock);
+				backend_disconnect(dev);
+				up_write(&pvcalls_back_global.privs_lock);
+				xenbus_switch_state(dev, XenbusStateClosing);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		case XenbusStateClosing:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+			case XenbusStateClosed:
+				xenbus_switch_state(dev, XenbusStateClosed);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		default:
+			__WARN();
+		}
+	}
+}
+
 static void pvcalls_back_changed(struct xenbus_device *dev,
 				 enum xenbus_state frontend_state)
 {
+	switch (frontend_state) {
+	case XenbusStateInitialising:
+		set_backend_state(dev, XenbusStateInitWait);
+		break;
+
+	case XenbusStateInitialised:
+	case XenbusStateConnected:
+		set_backend_state(dev, XenbusStateConnected);
+		break;
+
+	case XenbusStateClosing:
+		set_backend_state(dev, XenbusStateClosing);
+		break;
+
+	case XenbusStateClosed:
+		set_backend_state(dev, XenbusStateClosed);
+		if (xenbus_dev_is_online(dev))
+			break;
+		/* fall through if not online */
+	case XenbusStateUnknown:
+		set_backend_state(dev, XenbusStateClosed);
+		device_unregister(&dev->dev);
+		break;
+
+	default:
+		xenbus_dev_fatal(dev, -EINVAL, "saw state %d at frontend",
+				 frontend_state);
+		break;
+	}
 }
 
 static int pvcalls_back_remove(struct xenbus_device *dev)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 04/18] xen/pvcalls: xenbus state handling
@ 2017-05-15 20:35     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Introduce the code to handle xenbus state changes.

Implement the probe function for the pvcalls backend. Write the
supported versions, max-page-order and function-calls nodes to xenstore,
as required by the protocol.

Introduce stub functions for disconnecting/connecting to a frontend.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 133 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 46a889a..86eca19 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,9 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+#define PVCALLS_VERSIONS "1"
+#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+
 struct pvcalls_ioworker {
 	struct work_struct register_work;
 	atomic_t io;
@@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static int backend_connect(struct xenbus_device *dev)
+{
+	return 0;
+}
+
+static int backend_disconnect(struct xenbus_device *dev)
+{
+	return 0;
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
 			      const struct xenbus_device_id *id)
 {
+	int err;
+
+	err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
+			    PVCALLS_VERSIONS);
+	if (err) {
+		pr_warn("%s write out 'version' failed\n", __func__);
+		return -EINVAL;
+	}
+
+	err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
+			    MAX_RING_ORDER);
+	if (err) {
+		pr_warn("%s write out 'max-page-order' failed\n", __func__);
+		return -EINVAL;
+	}
+
+	/* "1" means socket, connect, release, bind, listen, accept and poll*/
+	err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");
+	if (err) {
+		pr_warn("%s write out 'function-calls' failed\n", __func__);
+		return -EINVAL;
+	}
+
+	err = xenbus_switch_state(dev, XenbusStateInitWait);
+	if (err)
+		return err;
+
 	return 0;
 }
 
+static void set_backend_state(struct xenbus_device *dev,
+			      enum xenbus_state state)
+{
+	while (dev->state != state) {
+		switch (dev->state) {
+		case XenbusStateClosed:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+				xenbus_switch_state(dev, XenbusStateInitWait);
+				break;
+			case XenbusStateClosing:
+				xenbus_switch_state(dev, XenbusStateClosing);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		case XenbusStateInitWait:
+		case XenbusStateInitialised:
+			switch (state) {
+			case XenbusStateConnected:
+				backend_connect(dev);
+				xenbus_switch_state(dev, XenbusStateConnected);
+				break;
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				xenbus_switch_state(dev, XenbusStateClosing);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		case XenbusStateConnected:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				down_write(&pvcalls_back_global.privs_lock);
+				backend_disconnect(dev);
+				up_write(&pvcalls_back_global.privs_lock);
+				xenbus_switch_state(dev, XenbusStateClosing);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		case XenbusStateClosing:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+			case XenbusStateClosed:
+				xenbus_switch_state(dev, XenbusStateClosed);
+				break;
+			default:
+				__WARN();
+			}
+			break;
+		default:
+			__WARN();
+		}
+	}
+}
+
 static void pvcalls_back_changed(struct xenbus_device *dev,
 				 enum xenbus_state frontend_state)
 {
+	switch (frontend_state) {
+	case XenbusStateInitialising:
+		set_backend_state(dev, XenbusStateInitWait);
+		break;
+
+	case XenbusStateInitialised:
+	case XenbusStateConnected:
+		set_backend_state(dev, XenbusStateConnected);
+		break;
+
+	case XenbusStateClosing:
+		set_backend_state(dev, XenbusStateClosing);
+		break;
+
+	case XenbusStateClosed:
+		set_backend_state(dev, XenbusStateClosed);
+		if (xenbus_dev_is_online(dev))
+			break;
+		/* fall through if not online */
+	case XenbusStateUnknown:
+		set_backend_state(dev, XenbusStateClosed);
+		device_unregister(&dev->dev);
+		break;
+
+	default:
+		xenbus_dev_fatal(dev, -EINVAL, "saw state %d at frontend",
+				 frontend_state);
+		break;
+	}
 }
 
 static int pvcalls_back_remove(struct xenbus_device *dev)
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 05/18] xen/pvcalls: connect to a frontend
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:35     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Introduce a per-frontend data structure named pvcalls_back_priv. It
contains pointers to the command ring, its event channel, a list of
active sockets and a tree of passive sockets (passing sockets need to be
looked up from the id on listen, accept and poll commands, while active
sockets only on release).

It also has an unbound workqueue to schedule the work of parsing and
executing commands on the command ring. pvcallss_lock protects the two
lists. In pvcalls_back_global, keep a list of connected frontends.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 86eca19..876e577 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -44,13 +44,100 @@ struct pvcalls_back_global {
 	struct rw_semaphore privs_lock;
 } pvcalls_back_global;
 
+struct pvcalls_back_priv {
+	struct list_head list;
+	struct xenbus_device *dev;
+	struct xen_pvcalls_sring *sring;
+	struct xen_pvcalls_back_ring ring;
+	int irq;
+	struct list_head socket_mappings;
+	struct radix_tree_root socketpass_mappings;
+	struct rw_semaphore pvcallss_lock;
+	atomic_t work;
+	struct workqueue_struct *wq;
+	struct work_struct register_work;
+};
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static void pvcalls_back_work(struct work_struct *work)
+{
+}
+
+static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
+{
+	return IRQ_HANDLED;
+}
+
 static int backend_connect(struct xenbus_device *dev)
 {
+	int err, evtchn;
+	grant_ref_t ring_ref;
+	void *addr = NULL;
+	struct pvcalls_back_priv *priv = NULL;
+
+	priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
+			   &evtchn);
+	if (err != 1) {
+		err = -EINVAL;
+		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
+				 dev->otherend);
+		goto error;
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", &ring_ref);
+	if (err != 1) {
+		err = -EINVAL;
+		xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
+				 dev->otherend);
+		goto error;
+	}
+
+	err = xenbus_map_ring_valloc(dev, &ring_ref, 1, &addr);
+	if (err < 0)
+		goto error;
+
+	err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
+						    pvcalls_back_event, 0,
+						    "pvcalls-backend", dev);
+	if (err < 0)
+		goto error;
+
+	priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
+	if (!priv->wq) {
+		err = -ENOMEM;
+		goto error;
+	}
+	INIT_WORK(&priv->register_work, pvcalls_back_work);
+	priv->dev = dev;
+	priv->sring = addr;
+	BACK_RING_INIT(&priv->ring, priv->sring, XEN_PAGE_SIZE * 1);
+	priv->irq = err;
+	INIT_LIST_HEAD(&priv->socket_mappings);
+	INIT_RADIX_TREE(&priv->socketpass_mappings, GFP_KERNEL);
+	init_rwsem(&priv->pvcallss_lock);
+	dev_set_drvdata(&dev->dev, priv);
+	down_write(&pvcalls_back_global.privs_lock);
+	list_add_tail(&priv->list, &pvcalls_back_global.privs);
+	up_write(&pvcalls_back_global.privs_lock);
+	queue_work(priv->wq, &priv->register_work);
+
 	return 0;
+
+ error:
+	if (addr != NULL)
+		xenbus_unmap_ring_vfree(dev, addr);
+	if (priv->wq)
+		destroy_workqueue(priv->wq);
+	unbind_from_irqhandler(priv->irq, dev);
+	kfree(priv);
+	return err;
 }
 
 static int backend_disconnect(struct xenbus_device *dev)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 05/18] xen/pvcalls: connect to a frontend
@ 2017-05-15 20:35     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Introduce a per-frontend data structure named pvcalls_back_priv. It
contains pointers to the command ring, its event channel, a list of
active sockets and a tree of passive sockets (passing sockets need to be
looked up from the id on listen, accept and poll commands, while active
sockets only on release).

It also has an unbound workqueue to schedule the work of parsing and
executing commands on the command ring. pvcallss_lock protects the two
lists. In pvcalls_back_global, keep a list of connected frontends.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 86eca19..876e577 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -44,13 +44,100 @@ struct pvcalls_back_global {
 	struct rw_semaphore privs_lock;
 } pvcalls_back_global;
 
+struct pvcalls_back_priv {
+	struct list_head list;
+	struct xenbus_device *dev;
+	struct xen_pvcalls_sring *sring;
+	struct xen_pvcalls_back_ring ring;
+	int irq;
+	struct list_head socket_mappings;
+	struct radix_tree_root socketpass_mappings;
+	struct rw_semaphore pvcallss_lock;
+	atomic_t work;
+	struct workqueue_struct *wq;
+	struct work_struct register_work;
+};
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static void pvcalls_back_work(struct work_struct *work)
+{
+}
+
+static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
+{
+	return IRQ_HANDLED;
+}
+
 static int backend_connect(struct xenbus_device *dev)
 {
+	int err, evtchn;
+	grant_ref_t ring_ref;
+	void *addr = NULL;
+	struct pvcalls_back_priv *priv = NULL;
+
+	priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
+			   &evtchn);
+	if (err != 1) {
+		err = -EINVAL;
+		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
+				 dev->otherend);
+		goto error;
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", &ring_ref);
+	if (err != 1) {
+		err = -EINVAL;
+		xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
+				 dev->otherend);
+		goto error;
+	}
+
+	err = xenbus_map_ring_valloc(dev, &ring_ref, 1, &addr);
+	if (err < 0)
+		goto error;
+
+	err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
+						    pvcalls_back_event, 0,
+						    "pvcalls-backend", dev);
+	if (err < 0)
+		goto error;
+
+	priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
+	if (!priv->wq) {
+		err = -ENOMEM;
+		goto error;
+	}
+	INIT_WORK(&priv->register_work, pvcalls_back_work);
+	priv->dev = dev;
+	priv->sring = addr;
+	BACK_RING_INIT(&priv->ring, priv->sring, XEN_PAGE_SIZE * 1);
+	priv->irq = err;
+	INIT_LIST_HEAD(&priv->socket_mappings);
+	INIT_RADIX_TREE(&priv->socketpass_mappings, GFP_KERNEL);
+	init_rwsem(&priv->pvcallss_lock);
+	dev_set_drvdata(&dev->dev, priv);
+	down_write(&pvcalls_back_global.privs_lock);
+	list_add_tail(&priv->list, &pvcalls_back_global.privs);
+	up_write(&pvcalls_back_global.privs_lock);
+	queue_work(priv->wq, &priv->register_work);
+
 	return 0;
+
+ error:
+	if (addr != NULL)
+		xenbus_unmap_ring_vfree(dev, addr);
+	if (priv->wq)
+		destroy_workqueue(priv->wq);
+	unbind_from_irqhandler(priv->irq, dev);
+	kfree(priv);
+	return err;
 }
 
 static int backend_disconnect(struct xenbus_device *dev)
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 06/18] xen/pvcalls: handle commands from the frontend
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:35     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

When the other end notifies us that there are commands to be read
(pvcalls_back_event), wake up the backend thread to parse the command.

The command ring works like most other Xen rings, so use the usual
ring macros to read and write to it. The functions implementing the
commands are empty stubs for now.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 115 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 876e577..2b2a49a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static int pvcalls_back_socket(struct xenbus_device *dev,
+		struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_connect(struct xenbus_device *dev,
+				struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_release(struct xenbus_device *dev,
+				struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_bind(struct xenbus_device *dev,
+			     struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_listen(struct xenbus_device *dev,
+			       struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_accept(struct xenbus_device *dev,
+			       struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_poll(struct xenbus_device *dev,
+			     struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
+				   struct xen_pvcalls_request *req)
+{
+	int ret = 0;
+
+	switch (req->cmd) {
+	case PVCALLS_SOCKET:
+		ret = pvcalls_back_socket(dev, req);
+		break;
+	case PVCALLS_CONNECT:
+		ret = pvcalls_back_connect(dev, req);
+		break;
+	case PVCALLS_RELEASE:
+		ret = pvcalls_back_release(dev, req);
+		break;
+	case PVCALLS_BIND:
+		ret = pvcalls_back_bind(dev, req);
+		break;
+	case PVCALLS_LISTEN:
+		ret = pvcalls_back_listen(dev, req);
+		break;
+	case PVCALLS_ACCEPT:
+		ret = pvcalls_back_accept(dev, req);
+		break;
+	case PVCALLS_POLL:
+		ret = pvcalls_back_poll(dev, req);
+		break;
+	default:
+		ret = -ENOTSUPP;
+		break;
+	}
+	return ret;
+}
+
 static void pvcalls_back_work(struct work_struct *work)
 {
+	struct pvcalls_back_priv *priv = container_of(work,
+		struct pvcalls_back_priv, register_work);
+	int notify, notify_all = 0, more = 1;
+	struct xen_pvcalls_request req;
+	struct xenbus_device *dev = priv->dev;
+
+	atomic_set(&priv->work, 1);
+
+	while (more || !atomic_dec_and_test(&priv->work)) {
+		while (RING_HAS_UNCONSUMED_REQUESTS(&priv->ring)) {
+			RING_COPY_REQUEST(&priv->ring,
+					  priv->ring.req_cons++,
+					  &req);
+
+			if (pvcalls_back_handle_cmd(dev, &req) > 0) {
+				RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
+					&priv->ring, notify);
+				notify_all += notify;
+			}
+		}
+
+		if (notify_all)
+			notify_remote_via_irq(priv->irq);
+
+		RING_FINAL_CHECK_FOR_REQUESTS(&priv->ring, more);
+	}
 }
 
 static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 {
+	struct xenbus_device *dev = dev_id;
+	struct pvcalls_back_priv *priv = NULL;
+
+	if (dev == NULL)
+		return IRQ_HANDLED;
+
+	priv = dev_get_drvdata(&dev->dev);
+	if (priv == NULL)
+		return IRQ_HANDLED;
+
+	atomic_inc(&priv->work);
+	queue_work(priv->wq, &priv->register_work);
+
 	return IRQ_HANDLED;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 06/18] xen/pvcalls: handle commands from the frontend
@ 2017-05-15 20:35     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

When the other end notifies us that there are commands to be read
(pvcalls_back_event), wake up the backend thread to parse the command.

The command ring works like most other Xen rings, so use the usual
ring macros to read and write to it. The functions implementing the
commands are empty stubs for now.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 115 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 876e577..2b2a49a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static int pvcalls_back_socket(struct xenbus_device *dev,
+		struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_connect(struct xenbus_device *dev,
+				struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_release(struct xenbus_device *dev,
+				struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_bind(struct xenbus_device *dev,
+			     struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_listen(struct xenbus_device *dev,
+			       struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_accept(struct xenbus_device *dev,
+			       struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_poll(struct xenbus_device *dev,
+			     struct xen_pvcalls_request *req)
+{
+	return 0;
+}
+
+static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
+				   struct xen_pvcalls_request *req)
+{
+	int ret = 0;
+
+	switch (req->cmd) {
+	case PVCALLS_SOCKET:
+		ret = pvcalls_back_socket(dev, req);
+		break;
+	case PVCALLS_CONNECT:
+		ret = pvcalls_back_connect(dev, req);
+		break;
+	case PVCALLS_RELEASE:
+		ret = pvcalls_back_release(dev, req);
+		break;
+	case PVCALLS_BIND:
+		ret = pvcalls_back_bind(dev, req);
+		break;
+	case PVCALLS_LISTEN:
+		ret = pvcalls_back_listen(dev, req);
+		break;
+	case PVCALLS_ACCEPT:
+		ret = pvcalls_back_accept(dev, req);
+		break;
+	case PVCALLS_POLL:
+		ret = pvcalls_back_poll(dev, req);
+		break;
+	default:
+		ret = -ENOTSUPP;
+		break;
+	}
+	return ret;
+}
+
 static void pvcalls_back_work(struct work_struct *work)
 {
+	struct pvcalls_back_priv *priv = container_of(work,
+		struct pvcalls_back_priv, register_work);
+	int notify, notify_all = 0, more = 1;
+	struct xen_pvcalls_request req;
+	struct xenbus_device *dev = priv->dev;
+
+	atomic_set(&priv->work, 1);
+
+	while (more || !atomic_dec_and_test(&priv->work)) {
+		while (RING_HAS_UNCONSUMED_REQUESTS(&priv->ring)) {
+			RING_COPY_REQUEST(&priv->ring,
+					  priv->ring.req_cons++,
+					  &req);
+
+			if (pvcalls_back_handle_cmd(dev, &req) > 0) {
+				RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
+					&priv->ring, notify);
+				notify_all += notify;
+			}
+		}
+
+		if (notify_all)
+			notify_remote_via_irq(priv->irq);
+
+		RING_FINAL_CHECK_FOR_REQUESTS(&priv->ring, more);
+	}
 }
 
 static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 {
+	struct xenbus_device *dev = dev_id;
+	struct pvcalls_back_priv *priv = NULL;
+
+	if (dev == NULL)
+		return IRQ_HANDLED;
+
+	priv = dev_get_drvdata(&dev->dev);
+	if (priv == NULL)
+		return IRQ_HANDLED;
+
+	atomic_inc(&priv->work);
+	queue_work(priv->wq, &priv->register_work);
+
 	return IRQ_HANDLED;
 }
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 07/18] xen/pvcalls: implement socket command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:35     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Just reply with success to the other end for now. Delay the allocation
of the actual socket to bind and/or connect.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2b2a49a..2eae096 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -12,12 +12,17 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/inet.h>
 #include <linux/kthread.h>
 #include <linux/list.h>
 #include <linux/radix-tree.h>
 #include <linux/module.h>
 #include <linux/rwsem.h>
 #include <linux/wait.h>
+#include <net/sock.h>
+#include <net/inet_common.h>
+#include <net/inet_connection_sock.h>
+#include <net/request_sock.h>
 
 #include <xen/events.h>
 #include <xen/grant_table.h>
@@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 static int pvcalls_back_socket(struct xenbus_device *dev,
 		struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	int ret;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	if (req->u.socket.domain != AF_INET ||
+	    req->u.socket.type != SOCK_STREAM ||
+	    (req->u.socket.protocol != 0 &&
+	     req->u.socket.protocol != AF_INET))
+		ret = -EAFNOSUPPORT;
+	else
+		ret = 0;
+
+	/* leave the actual socket allocation for later */
+
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.socket.id = req->u.socket.id;
+	rsp->ret = ret;
+
+	return 1;
 }
 
 static int pvcalls_back_connect(struct xenbus_device *dev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 07/18] xen/pvcalls: implement socket command
@ 2017-05-15 20:35     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:35 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Just reply with success to the other end for now. Delay the allocation
of the actual socket to bind and/or connect.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2b2a49a..2eae096 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -12,12 +12,17 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/inet.h>
 #include <linux/kthread.h>
 #include <linux/list.h>
 #include <linux/radix-tree.h>
 #include <linux/module.h>
 #include <linux/rwsem.h>
 #include <linux/wait.h>
+#include <net/sock.h>
+#include <net/inet_common.h>
+#include <net/inet_connection_sock.h>
+#include <net/request_sock.h>
 
 #include <xen/events.h>
 #include <xen/grant_table.h>
@@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 static int pvcalls_back_socket(struct xenbus_device *dev,
 		struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	int ret;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	if (req->u.socket.domain != AF_INET ||
+	    req->u.socket.type != SOCK_STREAM ||
+	    (req->u.socket.protocol != 0 &&
+	     req->u.socket.protocol != AF_INET))
+		ret = -EAFNOSUPPORT;
+	else
+		ret = 0;
+
+	/* leave the actual socket allocation for later */
+
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.socket.id = req->u.socket.id;
+	rsp->ret = ret;
+
+	return 1;
 }
 
 static int pvcalls_back_connect(struct xenbus_device *dev,
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (6 preceding siblings ...)
  2017-05-15 20:35     ` Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-16  2:36     ` Boris Ostrovsky
  2017-05-16  2:36     ` Boris Ostrovsky
  2017-05-15 20:36   ` Stefano Stabellini
                     ` (14 subsequent siblings)
  22 siblings, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Allocate a socket. Keep track of socket <-> ring mappings with a new data
structure, called sock_mapping. Implement the connect command by calling
inet_stream_connect, and mapping the new indexes page and data ring.
Associate the socket to an ioworker randomly.

When an active socket is closed (sk_state_change), set in_error to
-ENOTCONN and notify the other end, as specified by the protocol.

sk_data_ready will be implemented later.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 145 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 145 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2eae096..9ac1cf2 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -63,6 +63,29 @@ struct pvcalls_back_priv {
 	struct work_struct register_work;
 };
 
+struct sock_mapping {
+	struct list_head list;
+	struct list_head queue;
+	struct pvcalls_back_priv *priv;
+	struct socket *sock;
+	int data_worker;
+	uint64_t id;
+	grant_ref_t ref;
+	struct pvcalls_data_intf *ring;
+	void *bytes;
+	struct pvcalls_data data;
+	uint32_t ring_order;
+	int irq;
+	atomic_t read;
+	atomic_t write;
+	atomic_t release;
+	void (*saved_data_ready)(struct sock *sk);
+};
+
+static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
+static int pvcalls_back_release_active(struct xenbus_device *dev,
+				       struct pvcalls_back_priv *priv,
+				       struct sock_mapping *map);
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
@@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
 	return 1;
 }
 
+static void pvcalls_sk_state_change(struct sock *sock)
+{
+	struct sock_mapping *map = sock->sk_user_data;
+	struct pvcalls_data_intf *intf;
+
+	if (map == NULL)
+		return;
+
+	intf = map->ring;
+	intf->in_error = -ENOTCONN;
+	notify_remote_via_irq(map->irq);
+}
+
+static void pvcalls_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_connect(struct xenbus_device *dev,
 				struct xen_pvcalls_request *req)
 {
+	struct pvcalls_back_priv *priv;
+	int ret;
+	struct socket *sock;
+	struct sock_mapping *map = NULL;
+	void *page;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
+	if (ret < 0) {
+		kfree(map);
+		goto out;
+	}
+	INIT_LIST_HEAD(&map->queue);
+	map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
+
+	map->priv = priv;
+	map->sock = sock;
+	map->id = req->u.connect.id;
+	map->ref = req->u.connect.ref;
+
+	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
+	if (ret < 0) {
+		sock_release(map->sock);
+		kfree(map);
+		goto out;
+	}
+	map->ring = page;
+	map->ring_order = map->ring->ring_order;
+	/* first read the order, then map the data ring */
+	virt_rmb();
+	if (map->ring_order > MAX_RING_ORDER) {
+		ret = -EFAULT;
+		goto out;
+	}
+	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
+				     (1 << map->ring_order), &page);
+	if (ret < 0) {
+		sock_release(map->sock);
+		xenbus_unmap_ring_vfree(dev, map->ring);
+		kfree(map);
+		goto out;
+	}
+	map->bytes = page;
+
+	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
+						    req->u.connect.evtchn,
+						    pvcalls_back_conn_event,
+						    0,
+						    "pvcalls-backend",
+						    map);
+	if (ret < 0) {
+		sock_release(map->sock);
+		kfree(map);
+		goto out;
+	}
+	map->irq = ret;
+
+	map->data.in = map->bytes;
+	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
+
+	down_write(&priv->pvcallss_lock);
+	list_add_tail(&map->list, &priv->socket_mappings);
+	up_write(&priv->pvcallss_lock);
+
+	ret = inet_stream_connect(sock, (struct sockaddr *)&req->u.connect.addr,
+				  req->u.connect.len, req->u.connect.flags);
+	if (ret < 0) {
+		pvcalls_back_release_active(dev, priv, map);
+	} else {
+		lock_sock(sock->sk);
+		map->saved_data_ready = sock->sk->sk_data_ready;
+		sock->sk->sk_user_data = map;
+		sock->sk->sk_data_ready = pvcalls_sk_data_ready;
+		sock->sk->sk_state_change = pvcalls_sk_state_change;
+		release_sock(sock->sk);
+	}
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.connect.id = req->u.connect.id;
+	rsp->ret = ret;
+
+	return 1;
+}
+
+static int pvcalls_back_release_active(struct xenbus_device *dev,
+				       struct pvcalls_back_priv *priv,
+				       struct sock_mapping *map)
+{
 	return 0;
 }
 
@@ -215,6 +355,11 @@ static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
+{
+	return IRQ_HANDLED;
+}
+
 static int backend_connect(struct xenbus_device *dev)
 {
 	int err, evtchn;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (7 preceding siblings ...)
  2017-05-15 20:36   ` [PATCH 08/18] xen/pvcalls: implement connect command Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36     ` Stefano Stabellini
                     ` (13 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Allocate a socket. Keep track of socket <-> ring mappings with a new data
structure, called sock_mapping. Implement the connect command by calling
inet_stream_connect, and mapping the new indexes page and data ring.
Associate the socket to an ioworker randomly.

When an active socket is closed (sk_state_change), set in_error to
-ENOTCONN and notify the other end, as specified by the protocol.

sk_data_ready will be implemented later.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 145 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 145 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2eae096..9ac1cf2 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -63,6 +63,29 @@ struct pvcalls_back_priv {
 	struct work_struct register_work;
 };
 
+struct sock_mapping {
+	struct list_head list;
+	struct list_head queue;
+	struct pvcalls_back_priv *priv;
+	struct socket *sock;
+	int data_worker;
+	uint64_t id;
+	grant_ref_t ref;
+	struct pvcalls_data_intf *ring;
+	void *bytes;
+	struct pvcalls_data data;
+	uint32_t ring_order;
+	int irq;
+	atomic_t read;
+	atomic_t write;
+	atomic_t release;
+	void (*saved_data_ready)(struct sock *sk);
+};
+
+static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
+static int pvcalls_back_release_active(struct xenbus_device *dev,
+				       struct pvcalls_back_priv *priv,
+				       struct sock_mapping *map);
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
@@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
 	return 1;
 }
 
+static void pvcalls_sk_state_change(struct sock *sock)
+{
+	struct sock_mapping *map = sock->sk_user_data;
+	struct pvcalls_data_intf *intf;
+
+	if (map == NULL)
+		return;
+
+	intf = map->ring;
+	intf->in_error = -ENOTCONN;
+	notify_remote_via_irq(map->irq);
+}
+
+static void pvcalls_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_connect(struct xenbus_device *dev,
 				struct xen_pvcalls_request *req)
 {
+	struct pvcalls_back_priv *priv;
+	int ret;
+	struct socket *sock;
+	struct sock_mapping *map = NULL;
+	void *page;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
+	if (ret < 0) {
+		kfree(map);
+		goto out;
+	}
+	INIT_LIST_HEAD(&map->queue);
+	map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
+
+	map->priv = priv;
+	map->sock = sock;
+	map->id = req->u.connect.id;
+	map->ref = req->u.connect.ref;
+
+	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
+	if (ret < 0) {
+		sock_release(map->sock);
+		kfree(map);
+		goto out;
+	}
+	map->ring = page;
+	map->ring_order = map->ring->ring_order;
+	/* first read the order, then map the data ring */
+	virt_rmb();
+	if (map->ring_order > MAX_RING_ORDER) {
+		ret = -EFAULT;
+		goto out;
+	}
+	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
+				     (1 << map->ring_order), &page);
+	if (ret < 0) {
+		sock_release(map->sock);
+		xenbus_unmap_ring_vfree(dev, map->ring);
+		kfree(map);
+		goto out;
+	}
+	map->bytes = page;
+
+	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
+						    req->u.connect.evtchn,
+						    pvcalls_back_conn_event,
+						    0,
+						    "pvcalls-backend",
+						    map);
+	if (ret < 0) {
+		sock_release(map->sock);
+		kfree(map);
+		goto out;
+	}
+	map->irq = ret;
+
+	map->data.in = map->bytes;
+	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
+
+	down_write(&priv->pvcallss_lock);
+	list_add_tail(&map->list, &priv->socket_mappings);
+	up_write(&priv->pvcallss_lock);
+
+	ret = inet_stream_connect(sock, (struct sockaddr *)&req->u.connect.addr,
+				  req->u.connect.len, req->u.connect.flags);
+	if (ret < 0) {
+		pvcalls_back_release_active(dev, priv, map);
+	} else {
+		lock_sock(sock->sk);
+		map->saved_data_ready = sock->sk->sk_data_ready;
+		sock->sk->sk_user_data = map;
+		sock->sk->sk_data_ready = pvcalls_sk_data_ready;
+		sock->sk->sk_state_change = pvcalls_sk_state_change;
+		release_sock(sock->sk);
+	}
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.connect.id = req->u.connect.id;
+	rsp->ret = ret;
+
+	return 1;
+}
+
+static int pvcalls_back_release_active(struct xenbus_device *dev,
+				       struct pvcalls_back_priv *priv,
+				       struct sock_mapping *map)
+{
 	return 0;
 }
 
@@ -215,6 +355,11 @@ static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
+{
+	return IRQ_HANDLED;
+}
+
 static int backend_connect(struct xenbus_device *dev)
 {
 	int err, evtchn;
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 09/18] xen/pvcalls: implement bind command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:36     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Allocate a socket. Track the allocated passive sockets with a new data
structure named sockpass_mapping. It contains an unbound workqueue to
schedule delayed work for the accept and poll commands. It also has a
reqcopy field to be used to store a copy of a request for delayed work.
Reads/writes to it are protected by a lock (the "copy_lock" spinlock).
Initialize the workqueue in pvcalls_back_bind.

Implement the bind command with inet_bind.

The pass_sk_data_ready event handler will be added later.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 89 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 9ac1cf2..ff4634d 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -82,6 +82,18 @@ struct sock_mapping {
 	void (*saved_data_ready)(struct sock *sk);
 };
 
+struct sockpass_mapping {
+	struct list_head list;
+	struct pvcalls_back_priv *priv;
+	struct socket *sock;
+	uint64_t id;
+	struct xen_pvcalls_request reqcopy;
+	spinlock_t copy_lock;
+	struct workqueue_struct *wq;
+	struct work_struct register_work;
+	void (*saved_data_ready)(struct sock *sk);
+};
+
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
 static int pvcalls_back_release_active(struct xenbus_device *dev,
 				       struct pvcalls_back_priv *priv,
@@ -249,10 +261,85 @@ static int pvcalls_back_release(struct xenbus_device *dev,
 	return 0;
 }
 
+static void __pvcalls_back_accept(struct work_struct *work)
+{
+}
+
+static void pvcalls_pass_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_bind(struct xenbus_device *dev,
 			     struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	int ret, err;
+	struct socket *sock;
+	struct sockpass_mapping *map = NULL;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	INIT_WORK(&map->register_work, __pvcalls_back_accept);
+	spin_lock_init(&map->copy_lock);
+	map->wq = alloc_workqueue("pvcalls_wq", WQ_UNBOUND, 1);
+	if (!map->wq) {
+		ret = -ENOMEM;
+		kfree(map);
+		goto out;
+	}
+
+	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
+	if (ret < 0) {
+		destroy_workqueue(map->wq);
+		kfree(map);
+		goto out;
+	}
+
+	ret = inet_bind(sock, (struct sockaddr *)&req->u.bind.addr,
+			req->u.bind.len);
+	if (ret < 0) {
+		destroy_workqueue(map->wq);
+		kfree(map);
+		goto out;
+	}
+
+	map->priv = priv;
+	map->sock = sock;
+	map->id = req->u.bind.id;
+
+	down_write(&priv->pvcallss_lock);
+	err = radix_tree_insert(&priv->socketpass_mappings, map->id,
+				map);
+	up_write(&priv->pvcallss_lock);
+	if (err) {
+		ret = err;
+		destroy_workqueue(map->wq);
+		kfree(map);
+		goto out;
+	}
+
+	lock_sock(sock->sk);
+	map->saved_data_ready = sock->sk->sk_data_ready;
+	sock->sk->sk_user_data = map;
+	sock->sk->sk_data_ready = pvcalls_pass_sk_data_ready;
+	release_sock(sock->sk);
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.bind.id = req->u.bind.id;
+	rsp->ret = ret;
+	return 1;
 }
 
 static int pvcalls_back_listen(struct xenbus_device *dev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 09/18] xen/pvcalls: implement bind command
@ 2017-05-15 20:36     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Allocate a socket. Track the allocated passive sockets with a new data
structure named sockpass_mapping. It contains an unbound workqueue to
schedule delayed work for the accept and poll commands. It also has a
reqcopy field to be used to store a copy of a request for delayed work.
Reads/writes to it are protected by a lock (the "copy_lock" spinlock).
Initialize the workqueue in pvcalls_back_bind.

Implement the bind command with inet_bind.

The pass_sk_data_ready event handler will be added later.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 89 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 9ac1cf2..ff4634d 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -82,6 +82,18 @@ struct sock_mapping {
 	void (*saved_data_ready)(struct sock *sk);
 };
 
+struct sockpass_mapping {
+	struct list_head list;
+	struct pvcalls_back_priv *priv;
+	struct socket *sock;
+	uint64_t id;
+	struct xen_pvcalls_request reqcopy;
+	spinlock_t copy_lock;
+	struct workqueue_struct *wq;
+	struct work_struct register_work;
+	void (*saved_data_ready)(struct sock *sk);
+};
+
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
 static int pvcalls_back_release_active(struct xenbus_device *dev,
 				       struct pvcalls_back_priv *priv,
@@ -249,10 +261,85 @@ static int pvcalls_back_release(struct xenbus_device *dev,
 	return 0;
 }
 
+static void __pvcalls_back_accept(struct work_struct *work)
+{
+}
+
+static void pvcalls_pass_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_bind(struct xenbus_device *dev,
 			     struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	int ret, err;
+	struct socket *sock;
+	struct sockpass_mapping *map = NULL;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	INIT_WORK(&map->register_work, __pvcalls_back_accept);
+	spin_lock_init(&map->copy_lock);
+	map->wq = alloc_workqueue("pvcalls_wq", WQ_UNBOUND, 1);
+	if (!map->wq) {
+		ret = -ENOMEM;
+		kfree(map);
+		goto out;
+	}
+
+	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
+	if (ret < 0) {
+		destroy_workqueue(map->wq);
+		kfree(map);
+		goto out;
+	}
+
+	ret = inet_bind(sock, (struct sockaddr *)&req->u.bind.addr,
+			req->u.bind.len);
+	if (ret < 0) {
+		destroy_workqueue(map->wq);
+		kfree(map);
+		goto out;
+	}
+
+	map->priv = priv;
+	map->sock = sock;
+	map->id = req->u.bind.id;
+
+	down_write(&priv->pvcallss_lock);
+	err = radix_tree_insert(&priv->socketpass_mappings, map->id,
+				map);
+	up_write(&priv->pvcallss_lock);
+	if (err) {
+		ret = err;
+		destroy_workqueue(map->wq);
+		kfree(map);
+		goto out;
+	}
+
+	lock_sock(sock->sk);
+	map->saved_data_ready = sock->sk->sk_data_ready;
+	sock->sk->sk_user_data = map;
+	sock->sk->sk_data_ready = pvcalls_pass_sk_data_ready;
+	release_sock(sock->sk);
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.bind.id = req->u.bind.id;
+	rsp->ret = ret;
+	return 1;
 }
 
 static int pvcalls_back_listen(struct xenbus_device *dev,
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 10/18] xen/pvcalls: implement listen command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (10 preceding siblings ...)
  2017-05-15 20:36   ` [PATCH 10/18] xen/pvcalls: implement listen command Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36     ` Stefano Stabellini
                     ` (10 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Call inet_listen to implement the listen command.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index ff4634d..a762877 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -345,7 +345,28 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
 static int pvcalls_back_listen(struct xenbus_device *dev,
 			       struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	int ret = -EINVAL;
+	struct sockpass_mapping *map;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	map = radix_tree_lookup(&priv->socketpass_mappings, req->u.listen.id);
+	if (map == NULL)
+		goto out;
+
+	ret = inet_listen(map->sock, req->u.listen.backlog);
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.listen.id = req->u.listen.id;
+	rsp->ret = ret;
+	return 1;
 }
 
 static int pvcalls_back_accept(struct xenbus_device *dev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 10/18] xen/pvcalls: implement listen command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (9 preceding siblings ...)
  2017-05-15 20:36     ` Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36   ` Stefano Stabellini
                     ` (11 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Call inet_listen to implement the listen command.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index ff4634d..a762877 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -345,7 +345,28 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
 static int pvcalls_back_listen(struct xenbus_device *dev,
 			       struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	int ret = -EINVAL;
+	struct sockpass_mapping *map;
+	struct xen_pvcalls_response *rsp;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	map = radix_tree_lookup(&priv->socketpass_mappings, req->u.listen.id);
+	if (map == NULL)
+		goto out;
+
+	ret = inet_listen(map->sock, req->u.listen.backlog);
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.listen.id = req->u.listen.id;
+	rsp->ret = ret;
+	return 1;
 }
 
 static int pvcalls_back_accept(struct xenbus_device *dev,
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 11/18] xen/pvcalls: implement accept command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:36     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Implement the accept command by calling inet_accept. To avoid blocking
in the kernel, call inet_accept(O_NONBLOCK) from a workqueue, which get
scheduled on sk_data_ready (for a passive socket, it means that there
are connections to accept).

Use the reqcopy field to store the request. Accept the new socket from
the delayed work function, create a new sock_mapping for it, map
the indexes page and data ring, and reply to the other end. Choose an
ioworker for the socket randomly.

Only support one outstanding blocking accept request for every socket at
any time.

Add a field to sock_mapping to remember the passive socket from which an
active socket was created.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 156 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 156 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index a762877..d8e0a60 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -67,6 +67,7 @@ struct sock_mapping {
 	struct list_head list;
 	struct list_head queue;
 	struct pvcalls_back_priv *priv;
+	struct sockpass_mapping *sockpass;
 	struct socket *sock;
 	int data_worker;
 	uint64_t id;
@@ -263,10 +264,128 @@ static int pvcalls_back_release(struct xenbus_device *dev,
 
 static void __pvcalls_back_accept(struct work_struct *work)
 {
+	struct sockpass_mapping *mappass = container_of(
+		work, struct sockpass_mapping, register_work);
+	struct sock_mapping *map;
+	struct pvcalls_ioworker *iow;
+	struct pvcalls_back_priv *priv;
+	struct xen_pvcalls_response *rsp;
+	struct xen_pvcalls_request *req;
+	void *page = NULL;
+	int notify;
+	int ret = -EINVAL;
+	unsigned long flags;
+
+	priv = mappass->priv;
+	/* We only need to check the value of "cmd" atomically on read. */
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	req = &mappass->reqcopy;
+	if (req->cmd != PVCALLS_ACCEPT) {
+		spin_unlock_irqrestore(&mappass->copy_lock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		ret = -ENOMEM;
+		goto out_error;
+	}
+
+	map->sock = sock_alloc();
+	if (!map->sock)
+		goto out_error;
+
+	INIT_LIST_HEAD(&map->queue);
+	map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
+	map->ref = req->u.accept.ref;
+
+	map->priv = priv;
+	map->sockpass = mappass;
+	map->sock->type = mappass->sock->type;
+	map->sock->ops = mappass->sock->ops;
+	map->id = req->u.accept.id_new;
+
+	ret = xenbus_map_ring_valloc(priv->dev, &req->u.accept.ref, 1, &page);
+	if (ret < 0)
+		goto out_error;
+	map->ring = page;
+	map->ring_order = map->ring->ring_order;
+	/* first read the order, then map the data ring */
+	virt_rmb();
+	if (map->ring_order > MAX_RING_ORDER) {
+		ret = -EFAULT;
+		goto out_error;
+	}
+	ret = xenbus_map_ring_valloc(priv->dev, map->ring->ref,
+				     (1 << map->ring_order), &page);
+	if (ret < 0)
+		goto out_error;
+	map->bytes = page;
+
+	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
+						    req->u.accept.evtchn,
+						    pvcalls_back_conn_event,
+						    0,
+						    "pvcalls-backend",
+						    map);
+	if (ret < 0)
+		goto out_error;
+	map->irq = ret;
+
+	map->data.in = map->bytes;
+	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
+
+	down_write(&priv->pvcallss_lock);
+	list_add_tail(&map->list, &priv->socket_mappings);
+	up_write(&priv->pvcallss_lock);
+
+	ret = inet_accept(mappass->sock, map->sock, O_NONBLOCK, true);
+	if (ret == -EAGAIN)
+		goto out_error;
+
+	lock_sock(map->sock->sk);
+	map->saved_data_ready = map->sock->sk->sk_data_ready;
+	map->sock->sk->sk_user_data = map;
+	map->sock->sk->sk_data_ready = pvcalls_sk_data_ready;
+	map->sock->sk->sk_state_change = pvcalls_sk_state_change;
+	release_sock(map->sock->sk);
+
+	iow = &pvcalls_back_global.ioworkers[map->data_worker];
+	spin_lock_irqsave(&iow->lock, flags);
+	atomic_inc(&map->read);
+	if (list_empty(&map->queue))
+		list_add_tail(&map->queue, &iow->wqs);
+	spin_unlock_irqrestore( &iow->lock, flags);
+	atomic_inc(&iow->io);
+	queue_work_on(map->data_worker, pvcalls_back_global.wq, &iow->register_work);
+
+out_error:
+	if (ret < 0)
+		pvcalls_back_release_active(priv->dev, priv, map);
+
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.accept.id = req->u.accept.id;
+	rsp->ret = ret;
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&priv->ring, notify);
+	if (notify)
+		notify_remote_via_irq(priv->irq);
+
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	mappass->reqcopy.cmd = 0;
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
 }
 
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
+	struct sockpass_mapping *mappass = sock->sk_user_data;
+
+	if (mappass == NULL)
+		return;
+
+	queue_work(mappass->wq, &mappass->register_work);
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -372,7 +491,44 @@ static int pvcalls_back_listen(struct xenbus_device *dev,
 static int pvcalls_back_accept(struct xenbus_device *dev,
 			       struct xen_pvcalls_request *req)
 {
+	struct pvcalls_back_priv *priv;
+	struct sockpass_mapping *mappass;
+	int ret = -EINVAL;
+	struct xen_pvcalls_response *rsp;
+	unsigned long flags;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	mappass = radix_tree_lookup(&priv->socketpass_mappings,
+		req->u.accept.id);
+	if (mappass == NULL)
+		goto out_error;
+
+	/* 
+	 * Limitation of the current implementation: only support one
+	 * concurrent accept or poll call on one socket.
+	 */
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	if (mappass->reqcopy.cmd != 0) {
+		spin_unlock_irqrestore(&mappass->copy_lock, flags);
+		ret = -EINTR;
+		goto out_error;
+	}
+
+	mappass->reqcopy = *req;
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
+	queue_work(mappass->wq, &mappass->register_work);
 	return 0;
+
+out_error:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.accept.id = req->u.accept.id;
+	rsp->ret = ret;
+	return 1;
 }
 
 static int pvcalls_back_poll(struct xenbus_device *dev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 11/18] xen/pvcalls: implement accept command
@ 2017-05-15 20:36     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Implement the accept command by calling inet_accept. To avoid blocking
in the kernel, call inet_accept(O_NONBLOCK) from a workqueue, which get
scheduled on sk_data_ready (for a passive socket, it means that there
are connections to accept).

Use the reqcopy field to store the request. Accept the new socket from
the delayed work function, create a new sock_mapping for it, map
the indexes page and data ring, and reply to the other end. Choose an
ioworker for the socket randomly.

Only support one outstanding blocking accept request for every socket at
any time.

Add a field to sock_mapping to remember the passive socket from which an
active socket was created.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 156 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 156 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index a762877..d8e0a60 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -67,6 +67,7 @@ struct sock_mapping {
 	struct list_head list;
 	struct list_head queue;
 	struct pvcalls_back_priv *priv;
+	struct sockpass_mapping *sockpass;
 	struct socket *sock;
 	int data_worker;
 	uint64_t id;
@@ -263,10 +264,128 @@ static int pvcalls_back_release(struct xenbus_device *dev,
 
 static void __pvcalls_back_accept(struct work_struct *work)
 {
+	struct sockpass_mapping *mappass = container_of(
+		work, struct sockpass_mapping, register_work);
+	struct sock_mapping *map;
+	struct pvcalls_ioworker *iow;
+	struct pvcalls_back_priv *priv;
+	struct xen_pvcalls_response *rsp;
+	struct xen_pvcalls_request *req;
+	void *page = NULL;
+	int notify;
+	int ret = -EINVAL;
+	unsigned long flags;
+
+	priv = mappass->priv;
+	/* We only need to check the value of "cmd" atomically on read. */
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	req = &mappass->reqcopy;
+	if (req->cmd != PVCALLS_ACCEPT) {
+		spin_unlock_irqrestore(&mappass->copy_lock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		ret = -ENOMEM;
+		goto out_error;
+	}
+
+	map->sock = sock_alloc();
+	if (!map->sock)
+		goto out_error;
+
+	INIT_LIST_HEAD(&map->queue);
+	map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
+	map->ref = req->u.accept.ref;
+
+	map->priv = priv;
+	map->sockpass = mappass;
+	map->sock->type = mappass->sock->type;
+	map->sock->ops = mappass->sock->ops;
+	map->id = req->u.accept.id_new;
+
+	ret = xenbus_map_ring_valloc(priv->dev, &req->u.accept.ref, 1, &page);
+	if (ret < 0)
+		goto out_error;
+	map->ring = page;
+	map->ring_order = map->ring->ring_order;
+	/* first read the order, then map the data ring */
+	virt_rmb();
+	if (map->ring_order > MAX_RING_ORDER) {
+		ret = -EFAULT;
+		goto out_error;
+	}
+	ret = xenbus_map_ring_valloc(priv->dev, map->ring->ref,
+				     (1 << map->ring_order), &page);
+	if (ret < 0)
+		goto out_error;
+	map->bytes = page;
+
+	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
+						    req->u.accept.evtchn,
+						    pvcalls_back_conn_event,
+						    0,
+						    "pvcalls-backend",
+						    map);
+	if (ret < 0)
+		goto out_error;
+	map->irq = ret;
+
+	map->data.in = map->bytes;
+	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
+
+	down_write(&priv->pvcallss_lock);
+	list_add_tail(&map->list, &priv->socket_mappings);
+	up_write(&priv->pvcallss_lock);
+
+	ret = inet_accept(mappass->sock, map->sock, O_NONBLOCK, true);
+	if (ret == -EAGAIN)
+		goto out_error;
+
+	lock_sock(map->sock->sk);
+	map->saved_data_ready = map->sock->sk->sk_data_ready;
+	map->sock->sk->sk_user_data = map;
+	map->sock->sk->sk_data_ready = pvcalls_sk_data_ready;
+	map->sock->sk->sk_state_change = pvcalls_sk_state_change;
+	release_sock(map->sock->sk);
+
+	iow = &pvcalls_back_global.ioworkers[map->data_worker];
+	spin_lock_irqsave(&iow->lock, flags);
+	atomic_inc(&map->read);
+	if (list_empty(&map->queue))
+		list_add_tail(&map->queue, &iow->wqs);
+	spin_unlock_irqrestore( &iow->lock, flags);
+	atomic_inc(&iow->io);
+	queue_work_on(map->data_worker, pvcalls_back_global.wq, &iow->register_work);
+
+out_error:
+	if (ret < 0)
+		pvcalls_back_release_active(priv->dev, priv, map);
+
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.accept.id = req->u.accept.id;
+	rsp->ret = ret;
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&priv->ring, notify);
+	if (notify)
+		notify_remote_via_irq(priv->irq);
+
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	mappass->reqcopy.cmd = 0;
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
 }
 
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
+	struct sockpass_mapping *mappass = sock->sk_user_data;
+
+	if (mappass == NULL)
+		return;
+
+	queue_work(mappass->wq, &mappass->register_work);
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -372,7 +491,44 @@ static int pvcalls_back_listen(struct xenbus_device *dev,
 static int pvcalls_back_accept(struct xenbus_device *dev,
 			       struct xen_pvcalls_request *req)
 {
+	struct pvcalls_back_priv *priv;
+	struct sockpass_mapping *mappass;
+	int ret = -EINVAL;
+	struct xen_pvcalls_response *rsp;
+	unsigned long flags;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	mappass = radix_tree_lookup(&priv->socketpass_mappings,
+		req->u.accept.id);
+	if (mappass == NULL)
+		goto out_error;
+
+	/* 
+	 * Limitation of the current implementation: only support one
+	 * concurrent accept or poll call on one socket.
+	 */
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	if (mappass->reqcopy.cmd != 0) {
+		spin_unlock_irqrestore(&mappass->copy_lock, flags);
+		ret = -EINTR;
+		goto out_error;
+	}
+
+	mappass->reqcopy = *req;
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
+	queue_work(mappass->wq, &mappass->register_work);
 	return 0;
+
+out_error:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.accept.id = req->u.accept.id;
+	rsp->ret = ret;
+	return 1;
 }
 
 static int pvcalls_back_poll(struct xenbus_device *dev,
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 12/18] xen/pvcalls: implement poll command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:36     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Implement poll on passive sockets by requesting a delayed response with
mappass->reqcopy, and reply back when there is data on the passive
socket.

Poll on active socket is unimplemented as by the spec, as the frontend
should just wait for events and check the indexes on the indexes page.

Only support one outstanding poll (or accept) request for every passive
socket at any given time.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 70 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index d8e0a60..d5b7412 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -381,11 +381,30 @@ static void __pvcalls_back_accept(struct work_struct *work)
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
 	struct sockpass_mapping *mappass = sock->sk_user_data;
+	struct pvcalls_back_priv *priv;
+	struct xen_pvcalls_response *rsp;
+	unsigned long flags;
+	int notify;
 
 	if (mappass == NULL)
 		return;
 
-	queue_work(mappass->wq, &mappass->register_work);
+	priv = mappass->priv;
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	if (mappass->reqcopy.cmd == PVCALLS_POLL) {
+		rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+		rsp->req_id = mappass->reqcopy.req_id;
+		rsp->u.poll.id = mappass->reqcopy.u.poll.id;
+		rsp->cmd = mappass->reqcopy.cmd;
+		rsp->ret = 0;
+
+		mappass->reqcopy.cmd = 0;
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&priv->ring, notify);
+		if (notify)
+			notify_remote_via_irq(mappass->priv->irq);
+	} else
+		queue_work(mappass->wq, &mappass->register_work);
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -534,6 +553,55 @@ static int pvcalls_back_accept(struct xenbus_device *dev,
 static int pvcalls_back_poll(struct xenbus_device *dev,
 			     struct xen_pvcalls_request *req)
 {
+	struct pvcalls_back_priv *priv;
+	struct sockpass_mapping *mappass;
+	struct xen_pvcalls_response *rsp;
+	struct inet_connection_sock *icsk;
+	struct request_sock_queue *queue;
+	unsigned long flags;
+	int ret;
+	bool data;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	mappass = radix_tree_lookup(&priv->socketpass_mappings, req->u.poll.id);
+	if (mappass == NULL)
+		return 0;
+
+	/*
+	 * Limitation of the current implementation: only support one
+	 * concurrent accept or poll call on one socket.
+	 */
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	if (mappass->reqcopy.cmd != 0) {
+		ret = -EINTR;
+		goto out;
+	}
+
+	mappass->reqcopy = *req;
+	lock_sock(mappass->sock->sk);
+	icsk = inet_csk(mappass->sock->sk);
+	queue = &icsk->icsk_accept_queue;
+	data = queue->rskq_accept_head != NULL;
+	release_sock(mappass->sock->sk);
+	if (data) {
+		mappass->reqcopy.cmd = 0;
+		ret = 0;
+		goto out;
+	}
+
+	return 0;
+
+out:
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
+
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.poll.id = req->u.poll.id;
+	rsp->ret = ret;
 	return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 12/18] xen/pvcalls: implement poll command
@ 2017-05-15 20:36     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Implement poll on passive sockets by requesting a delayed response with
mappass->reqcopy, and reply back when there is data on the passive
socket.

Poll on active socket is unimplemented as by the spec, as the frontend
should just wait for events and check the indexes on the indexes page.

Only support one outstanding poll (or accept) request for every passive
socket at any given time.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 70 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index d8e0a60..d5b7412 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -381,11 +381,30 @@ static void __pvcalls_back_accept(struct work_struct *work)
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
 	struct sockpass_mapping *mappass = sock->sk_user_data;
+	struct pvcalls_back_priv *priv;
+	struct xen_pvcalls_response *rsp;
+	unsigned long flags;
+	int notify;
 
 	if (mappass == NULL)
 		return;
 
-	queue_work(mappass->wq, &mappass->register_work);
+	priv = mappass->priv;
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	if (mappass->reqcopy.cmd == PVCALLS_POLL) {
+		rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+		rsp->req_id = mappass->reqcopy.req_id;
+		rsp->u.poll.id = mappass->reqcopy.u.poll.id;
+		rsp->cmd = mappass->reqcopy.cmd;
+		rsp->ret = 0;
+
+		mappass->reqcopy.cmd = 0;
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&priv->ring, notify);
+		if (notify)
+			notify_remote_via_irq(mappass->priv->irq);
+	} else
+		queue_work(mappass->wq, &mappass->register_work);
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -534,6 +553,55 @@ static int pvcalls_back_accept(struct xenbus_device *dev,
 static int pvcalls_back_poll(struct xenbus_device *dev,
 			     struct xen_pvcalls_request *req)
 {
+	struct pvcalls_back_priv *priv;
+	struct sockpass_mapping *mappass;
+	struct xen_pvcalls_response *rsp;
+	struct inet_connection_sock *icsk;
+	struct request_sock_queue *queue;
+	unsigned long flags;
+	int ret;
+	bool data;
+
+	if (dev == NULL)
+		return 0;
+	priv = dev_get_drvdata(&dev->dev);
+
+	mappass = radix_tree_lookup(&priv->socketpass_mappings, req->u.poll.id);
+	if (mappass == NULL)
+		return 0;
+
+	/*
+	 * Limitation of the current implementation: only support one
+	 * concurrent accept or poll call on one socket.
+	 */
+	spin_lock_irqsave(&mappass->copy_lock, flags);
+	if (mappass->reqcopy.cmd != 0) {
+		ret = -EINTR;
+		goto out;
+	}
+
+	mappass->reqcopy = *req;
+	lock_sock(mappass->sock->sk);
+	icsk = inet_csk(mappass->sock->sk);
+	queue = &icsk->icsk_accept_queue;
+	data = queue->rskq_accept_head != NULL;
+	release_sock(mappass->sock->sk);
+	if (data) {
+		mappass->reqcopy.cmd = 0;
+		ret = 0;
+		goto out;
+	}
+
+	return 0;
+
+out:
+	spin_unlock_irqrestore(&mappass->copy_lock, flags);
+
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->cmd = req->cmd;
+	rsp->u.poll.id = req->u.poll.id;
+	rsp->ret = ret;
 	return 0;
 }
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 13/18] xen/pvcalls: implement release command
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:36     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Release both active and passive sockets. For active sockets, make sure
to avoid possible conflicts with the ioworker reading/writing to those
sockets concurrently. Set map->release to let the ioworker know
atomically that the socket will be released soon, then wait until the
ioworker removed the socket from its list.

Unmap indexes pages and data rings.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 94 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index d5b7412..22c6426 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -253,13 +253,105 @@ static int pvcalls_back_release_active(struct xenbus_device *dev,
 				       struct pvcalls_back_priv *priv,
 				       struct sock_mapping *map)
 {
+	struct pvcalls_ioworker *iow =
+	    &pvcalls_back_global.ioworkers[map->data_worker];
+	unsigned long flags;
+	bool in_loop = false;
+
+
+	disable_irq(map->irq);
+	if (map->sock->sk != NULL) {
+		lock_sock(map->sock->sk);
+		map->sock->sk->sk_user_data = NULL;
+		map->sock->sk->sk_data_ready = map->saved_data_ready;
+		release_sock(map->sock->sk);
+	}
+
+	atomic_set(&map->release, 1);
+
+	/*
+	 * To avoid concurrency problems with ioworker, check if the socket
+	 * has any outstanding io requests. If so, wait until the ioworker
+	 * removes it from the list before proceeding.
+	 */
+	spin_lock_irqsave(&iow->lock, flags);
+	in_loop = !list_empty(&map->queue);
+	spin_unlock_irqrestore(&iow->lock, flags);
+
+	if (in_loop) {
+		atomic_inc(&iow->io);
+		queue_work_on(map->data_worker, pvcalls_back_global.wq,
+			      &iow->register_work);
+		while (atomic_read(&map->release) > 0)
+			cond_resched();
+	}
+
+	down_write(&priv->pvcallss_lock);
+	list_del(&map->list);
+	up_write(&priv->pvcallss_lock);
+
+	xenbus_unmap_ring_vfree(dev, (void *)map->bytes);
+	xenbus_unmap_ring_vfree(dev, (void *)map->ring);
+	unbind_from_irqhandler(map->irq, map);
+
+	sock_release(map->sock);
+	kfree(map);
+
+	return 0;
+}
+
+static int pvcalls_back_release_passive(struct xenbus_device *dev,
+					struct pvcalls_back_priv *priv,
+					struct sockpass_mapping *mappass)
+{
+	if (mappass->sock->sk != NULL) {
+		lock_sock(mappass->sock->sk);
+		mappass->sock->sk->sk_user_data = NULL;
+		mappass->sock->sk->sk_data_ready = mappass->saved_data_ready;
+		release_sock(mappass->sock->sk);
+	}
+	down_write(&priv->pvcallss_lock);
+	radix_tree_delete(&priv->socketpass_mappings, mappass->id);
+	sock_release(mappass->sock);
+	flush_workqueue(mappass->wq);
+	destroy_workqueue(mappass->wq);
+	kfree(mappass);
+	up_write(&priv->pvcallss_lock);
+
 	return 0;
 }
 
 static int pvcalls_back_release(struct xenbus_device *dev,
 				struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	struct sock_mapping *map, *n;
+	struct sockpass_mapping *mappass;
+	int ret = 0;
+	struct xen_pvcalls_response *rsp;
+
+	priv = dev_get_drvdata(&dev->dev);
+
+	list_for_each_entry_safe(map, n, &priv->socket_mappings, list) {
+		if (map->id == req->u.release.id) {
+			ret = pvcalls_back_release_active(dev, priv, map);
+			goto out;
+		}
+	}
+	mappass = radix_tree_lookup(&priv->socketpass_mappings,
+				    req->u.release.id);
+	if (mappass != NULL) {
+		ret = pvcalls_back_release_passive(dev, priv, mappass);
+		goto out;
+	}
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->u.release.id = req->u.release.id;
+	rsp->cmd = req->cmd;
+	rsp->ret = ret;
+	return 1;
 }
 
 static void __pvcalls_back_accept(struct work_struct *work)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 13/18] xen/pvcalls: implement release command
@ 2017-05-15 20:36     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Release both active and passive sockets. For active sockets, make sure
to avoid possible conflicts with the ioworker reading/writing to those
sockets concurrently. Set map->release to let the ioworker know
atomically that the socket will be released soon, then wait until the
ioworker removed the socket from its list.

Unmap indexes pages and data rings.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 94 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index d5b7412..22c6426 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -253,13 +253,105 @@ static int pvcalls_back_release_active(struct xenbus_device *dev,
 				       struct pvcalls_back_priv *priv,
 				       struct sock_mapping *map)
 {
+	struct pvcalls_ioworker *iow =
+	    &pvcalls_back_global.ioworkers[map->data_worker];
+	unsigned long flags;
+	bool in_loop = false;
+
+
+	disable_irq(map->irq);
+	if (map->sock->sk != NULL) {
+		lock_sock(map->sock->sk);
+		map->sock->sk->sk_user_data = NULL;
+		map->sock->sk->sk_data_ready = map->saved_data_ready;
+		release_sock(map->sock->sk);
+	}
+
+	atomic_set(&map->release, 1);
+
+	/*
+	 * To avoid concurrency problems with ioworker, check if the socket
+	 * has any outstanding io requests. If so, wait until the ioworker
+	 * removes it from the list before proceeding.
+	 */
+	spin_lock_irqsave(&iow->lock, flags);
+	in_loop = !list_empty(&map->queue);
+	spin_unlock_irqrestore(&iow->lock, flags);
+
+	if (in_loop) {
+		atomic_inc(&iow->io);
+		queue_work_on(map->data_worker, pvcalls_back_global.wq,
+			      &iow->register_work);
+		while (atomic_read(&map->release) > 0)
+			cond_resched();
+	}
+
+	down_write(&priv->pvcallss_lock);
+	list_del(&map->list);
+	up_write(&priv->pvcallss_lock);
+
+	xenbus_unmap_ring_vfree(dev, (void *)map->bytes);
+	xenbus_unmap_ring_vfree(dev, (void *)map->ring);
+	unbind_from_irqhandler(map->irq, map);
+
+	sock_release(map->sock);
+	kfree(map);
+
+	return 0;
+}
+
+static int pvcalls_back_release_passive(struct xenbus_device *dev,
+					struct pvcalls_back_priv *priv,
+					struct sockpass_mapping *mappass)
+{
+	if (mappass->sock->sk != NULL) {
+		lock_sock(mappass->sock->sk);
+		mappass->sock->sk->sk_user_data = NULL;
+		mappass->sock->sk->sk_data_ready = mappass->saved_data_ready;
+		release_sock(mappass->sock->sk);
+	}
+	down_write(&priv->pvcallss_lock);
+	radix_tree_delete(&priv->socketpass_mappings, mappass->id);
+	sock_release(mappass->sock);
+	flush_workqueue(mappass->wq);
+	destroy_workqueue(mappass->wq);
+	kfree(mappass);
+	up_write(&priv->pvcallss_lock);
+
 	return 0;
 }
 
 static int pvcalls_back_release(struct xenbus_device *dev,
 				struct xen_pvcalls_request *req)
 {
-	return 0;
+	struct pvcalls_back_priv *priv;
+	struct sock_mapping *map, *n;
+	struct sockpass_mapping *mappass;
+	int ret = 0;
+	struct xen_pvcalls_response *rsp;
+
+	priv = dev_get_drvdata(&dev->dev);
+
+	list_for_each_entry_safe(map, n, &priv->socket_mappings, list) {
+		if (map->id == req->u.release.id) {
+			ret = pvcalls_back_release_active(dev, priv, map);
+			goto out;
+		}
+	}
+	mappass = radix_tree_lookup(&priv->socketpass_mappings,
+				    req->u.release.id);
+	if (mappass != NULL) {
+		ret = pvcalls_back_release_passive(dev, priv, mappass);
+		goto out;
+	}
+
+out:
+	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
+	rsp->req_id = req->req_id;
+	rsp->u.release.id = req->u.release.id;
+	rsp->cmd = req->cmd;
+	rsp->ret = ret;
+	return 1;
 }
 
 static void __pvcalls_back_accept(struct work_struct *work)
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 14/18] xen/pvcalls: disconnect and module_exit
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (14 preceding siblings ...)
  2017-05-15 20:36     ` Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36   ` Stefano Stabellini
                     ` (6 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Implement backend_disconnect. Call pvcalls_back_release_active on active
sockets and pvcalls_back_release_passive on passive sockets.

Implement module_exit by calling backend_disconnect on frontend
connections.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 22c6426..0daa90a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -855,6 +855,35 @@ static int backend_connect(struct xenbus_device *dev)
 
 static int backend_disconnect(struct xenbus_device *dev)
 {
+	struct pvcalls_back_priv *priv;
+	struct sock_mapping *map, *n;
+	struct sockpass_mapping *mappass;
+	struct radix_tree_iter iter;
+	void **slot;
+
+
+	priv = dev_get_drvdata(&dev->dev);
+
+	list_for_each_entry_safe(map, n, &priv->socket_mappings, list) {
+		pvcalls_back_release_active(dev, priv, map);
+	}
+	radix_tree_for_each_slot(slot, &priv->socketpass_mappings, &iter, 0) {
+		mappass = radix_tree_deref_slot(slot);
+		if (!mappass || radix_tree_exception(mappass)) {
+			if (radix_tree_deref_retry(mappass)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
+		} else
+			pvcalls_back_release_passive(dev, priv, mappass);
+	}
+	xenbus_unmap_ring_vfree(dev, (void *)priv->sring);
+	unbind_from_irqhandler(priv->irq, dev);
+	list_del(&priv->list);
+	destroy_workqueue(priv->wq);
+	kfree(priv);
+	dev_set_drvdata(&dev->dev, NULL);
+
 	return 0;
 }
 
@@ -1056,3 +1085,22 @@ static int __init pvcalls_back_init(void)
 	return -ENOMEM;
 }
 module_init(pvcalls_back_init);
+
+static void __exit pvcalls_back_fin(void)
+{
+	struct pvcalls_back_priv *priv, *npriv;
+
+	down_write(&pvcalls_back_global.privs_lock);
+	list_for_each_entry_safe(priv, npriv, &pvcalls_back_global.privs,
+				 list) {
+		backend_disconnect(priv->dev);
+	}
+	up_write(&pvcalls_back_global.privs_lock);
+
+	xenbus_unregister_driver(&pvcalls_back_driver);
+	destroy_workqueue(pvcalls_back_global.wq);
+	kfree(pvcalls_back_global.ioworkers);
+	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
+}
+
+module_exit(pvcalls_back_fin);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 14/18] xen/pvcalls: disconnect and module_exit
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (15 preceding siblings ...)
  2017-05-15 20:36   ` [PATCH 14/18] xen/pvcalls: disconnect and module_exit Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36   ` [PATCH 15/18] xen/pvcalls: introduce the ioworker Stefano Stabellini
                     ` (5 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Implement backend_disconnect. Call pvcalls_back_release_active on active
sockets and pvcalls_back_release_passive on passive sockets.

Implement module_exit by calling backend_disconnect on frontend
connections.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 22c6426..0daa90a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -855,6 +855,35 @@ static int backend_connect(struct xenbus_device *dev)
 
 static int backend_disconnect(struct xenbus_device *dev)
 {
+	struct pvcalls_back_priv *priv;
+	struct sock_mapping *map, *n;
+	struct sockpass_mapping *mappass;
+	struct radix_tree_iter iter;
+	void **slot;
+
+
+	priv = dev_get_drvdata(&dev->dev);
+
+	list_for_each_entry_safe(map, n, &priv->socket_mappings, list) {
+		pvcalls_back_release_active(dev, priv, map);
+	}
+	radix_tree_for_each_slot(slot, &priv->socketpass_mappings, &iter, 0) {
+		mappass = radix_tree_deref_slot(slot);
+		if (!mappass || radix_tree_exception(mappass)) {
+			if (radix_tree_deref_retry(mappass)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
+		} else
+			pvcalls_back_release_passive(dev, priv, mappass);
+	}
+	xenbus_unmap_ring_vfree(dev, (void *)priv->sring);
+	unbind_from_irqhandler(priv->irq, dev);
+	list_del(&priv->list);
+	destroy_workqueue(priv->wq);
+	kfree(priv);
+	dev_set_drvdata(&dev->dev, NULL);
+
 	return 0;
 }
 
@@ -1056,3 +1085,22 @@ static int __init pvcalls_back_init(void)
 	return -ENOMEM;
 }
 module_init(pvcalls_back_init);
+
+static void __exit pvcalls_back_fin(void)
+{
+	struct pvcalls_back_priv *priv, *npriv;
+
+	down_write(&pvcalls_back_global.privs_lock);
+	list_for_each_entry_safe(priv, npriv, &pvcalls_back_global.privs,
+				 list) {
+		backend_disconnect(priv->dev);
+	}
+	up_write(&pvcalls_back_global.privs_lock);
+
+	xenbus_unregister_driver(&pvcalls_back_driver);
+	destroy_workqueue(pvcalls_back_global.wq);
+	kfree(pvcalls_back_global.ioworkers);
+	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
+}
+
+module_exit(pvcalls_back_fin);
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 15/18] xen/pvcalls: introduce the ioworker
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (16 preceding siblings ...)
  2017-05-15 20:36   ` Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36   ` Stefano Stabellini
                     ` (4 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

We have one ioworker per cpu core. Each ioworker gets assigned active
sockets randomly. Once a socket is assigned to an ioworker, it remains
tied to it until is released.

Each ioworker goes through the list of outstanding read/write requests
by walking a list of struct sock_mapping. Once a request has been dealt
with, the struct sock_mapping is removed from the list.

We use one atomic counter per socket for "read" operations and one
for "write" operations to keep track of the reads/writes to do.

We also use one atomic counter ("io") per ioworker to keep track of how
many outstanding requests we have in total assigned to the ioworker. The
ioworker finishes when there are none.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 0daa90a..db3e02c 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -99,8 +99,52 @@ struct sockpass_mapping {
 static int pvcalls_back_release_active(struct xenbus_device *dev,
 				       struct pvcalls_back_priv *priv,
 				       struct sock_mapping *map);
+
+static void pvcalls_conn_back_read(unsigned long opaque)
+{
+}
+
+static int pvcalls_conn_back_write(struct sock_mapping *map)
+{
+	return 0;
+}
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
+	struct pvcalls_ioworker *ioworker = container_of(work,
+		struct pvcalls_ioworker, register_work);
+	int num = ioworker->num;
+	struct sock_mapping *map, *n;
+	unsigned long flags;
+
+	while (atomic_read(&ioworker->io) > 0) {
+		spin_lock_irqsave(&ioworker->lock, flags);
+		list_for_each_entry_safe(map, n, &ioworker->wqs, queue) {
+			if (map->data_worker != num)
+				continue;
+
+			if (atomic_read(&map->release) > 0) {
+				list_del_init(&map->queue);
+				atomic_set(&map->release, 0);
+				continue;
+			}
+
+			spin_unlock_irqrestore(&ioworker->lock, flags);
+			if (atomic_read(&map->read) > 0)
+				pvcalls_conn_back_read((unsigned long)map);
+			if (atomic_read(&map->write) > 0)
+				pvcalls_conn_back_write(map);
+			spin_lock_irqsave(&ioworker->lock, flags);
+
+			if (atomic_read(&map->read) == 0 &&
+				atomic_read(&map->write) == 0) {
+				list_del_init(&map->queue);
+				atomic_set(&map->release, 0);
+			}
+		}
+		atomic_dec(&ioworker->io);
+		spin_unlock_irqrestore(&ioworker->lock, flags);
+	}
 }
 
 static int pvcalls_back_socket(struct xenbus_device *dev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 15/18] xen/pvcalls: introduce the ioworker
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (17 preceding siblings ...)
  2017-05-15 20:36   ` [PATCH 15/18] xen/pvcalls: introduce the ioworker Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36     ` Stefano Stabellini
                     ` (3 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

We have one ioworker per cpu core. Each ioworker gets assigned active
sockets randomly. Once a socket is assigned to an ioworker, it remains
tied to it until is released.

Each ioworker goes through the list of outstanding read/write requests
by walking a list of struct sock_mapping. Once a request has been dealt
with, the struct sock_mapping is removed from the list.

We use one atomic counter per socket for "read" operations and one
for "write" operations to keep track of the reads/writes to do.

We also use one atomic counter ("io") per ioworker to keep track of how
many outstanding requests we have in total assigned to the ioworker. The
ioworker finishes when there are none.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 0daa90a..db3e02c 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -99,8 +99,52 @@ struct sockpass_mapping {
 static int pvcalls_back_release_active(struct xenbus_device *dev,
 				       struct pvcalls_back_priv *priv,
 				       struct sock_mapping *map);
+
+static void pvcalls_conn_back_read(unsigned long opaque)
+{
+}
+
+static int pvcalls_conn_back_write(struct sock_mapping *map)
+{
+	return 0;
+}
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
+	struct pvcalls_ioworker *ioworker = container_of(work,
+		struct pvcalls_ioworker, register_work);
+	int num = ioworker->num;
+	struct sock_mapping *map, *n;
+	unsigned long flags;
+
+	while (atomic_read(&ioworker->io) > 0) {
+		spin_lock_irqsave(&ioworker->lock, flags);
+		list_for_each_entry_safe(map, n, &ioworker->wqs, queue) {
+			if (map->data_worker != num)
+				continue;
+
+			if (atomic_read(&map->release) > 0) {
+				list_del_init(&map->queue);
+				atomic_set(&map->release, 0);
+				continue;
+			}
+
+			spin_unlock_irqrestore(&ioworker->lock, flags);
+			if (atomic_read(&map->read) > 0)
+				pvcalls_conn_back_read((unsigned long)map);
+			if (atomic_read(&map->write) > 0)
+				pvcalls_conn_back_write(map);
+			spin_lock_irqsave(&ioworker->lock, flags);
+
+			if (atomic_read(&map->read) == 0 &&
+				atomic_read(&map->write) == 0) {
+				list_del_init(&map->queue);
+				atomic_set(&map->release, 0);
+			}
+		}
+		atomic_dec(&ioworker->io);
+		spin_unlock_irqrestore(&ioworker->lock, flags);
+	}
 }
 
 static int pvcalls_back_socket(struct xenbus_device *dev,
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 16/18] xen/pvcalls: implement read
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:36     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

When an active socket has data available, add the relative sock_mapping
to the ioworker list, increment the io and read counters, and schedule
the ioworker.

Implement the read function by reading from the socket, writing the data
to the data ring.

Set in_error on error.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index db3e02c..0f715a8 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -102,6 +102,79 @@ static int pvcalls_back_release_active(struct xenbus_device *dev,
 
 static void pvcalls_conn_back_read(unsigned long opaque)
 {
+	struct sock_mapping *map = (struct sock_mapping *)opaque;
+	struct msghdr msg;
+	struct kvec vec[2];
+	RING_IDX cons, prod, size, wanted, array_size, masked_prod, masked_cons;
+	int32_t error;
+	struct pvcalls_data_intf *intf = map->ring;
+	struct pvcalls_data *data = &map->data;
+	int ret;
+
+	array_size = XEN_FLEX_RING_SIZE(map->ring_order);
+	cons = intf->in_cons;
+	prod = intf->in_prod;
+	error = intf->in_error;
+	/* read the indexes first, then deal with the data */
+	virt_mb();
+
+	if (error)
+		return;
+
+	size = pvcalls_queued(prod, cons, array_size);
+	if (size >= array_size)
+		return;
+	lock_sock(map->sock->sk);
+	if (skb_queue_empty(&map->sock->sk->sk_receive_queue)) {
+		atomic_set(&map->read, 0);
+		release_sock(map->sock->sk);
+		return;
+	}
+	release_sock(map->sock->sk);
+	wanted = array_size - size;
+	masked_prod = pvcalls_mask(prod, array_size);
+	masked_cons = pvcalls_mask(cons, array_size);
+
+	memset(&msg, 0, sizeof(msg));
+	msg.msg_iter.type = ITER_KVEC|WRITE;
+	msg.msg_iter.count = wanted;
+	if (masked_prod < masked_cons) {
+		vec[0].iov_base = data->in + masked_prod;
+		vec[0].iov_len = wanted;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 1;
+	} else {
+		vec[0].iov_base = data->in + masked_prod;
+		vec[0].iov_len = array_size - masked_prod;
+		vec[1].iov_base = data->in;
+		vec[1].iov_len = wanted - vec[0].iov_len;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 2;
+	}
+
+	atomic_set(&map->read, 0);
+	ret = inet_recvmsg(map->sock, &msg, wanted, MSG_DONTWAIT);
+	WARN_ON(ret > 0 && ret > wanted);
+	if (ret == -EAGAIN) /* shouldn't happen */
+		return;
+	if (!ret)
+		ret = -ENOTCONN;
+	lock_sock(map->sock->sk);
+	if (ret > 0 && !skb_queue_empty(&map->sock->sk->sk_receive_queue))
+		atomic_inc(&map->read);
+	release_sock(map->sock->sk);
+
+	/* write the data, then modify the indexes */
+	virt_wmb();
+	if (ret < 0)
+		intf->in_error = ret;
+	else
+		intf->in_prod = prod + ret;
+	/* update the indexes, then notify the other end */
+	virt_wmb();
+	notify_remote_via_irq(map->irq);
+
+	return;
 }
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
@@ -192,6 +265,22 @@ static void pvcalls_sk_state_change(struct sock *sock)
 
 static void pvcalls_sk_data_ready(struct sock *sock)
 {
+	struct sock_mapping *map = sock->sk_user_data;
+	struct pvcalls_ioworker *iow;
+	unsigned long flags;
+
+	if (map == NULL)
+		return;
+
+	iow = &pvcalls_back_global.ioworkers[map->data_worker];
+	spin_lock_irqsave(&iow->lock, flags);
+	atomic_inc(&map->read);
+	if (list_empty(&map->queue))
+		list_add_tail(&map->queue, &iow->wqs);
+	spin_unlock_irqrestore(&iow->lock, flags);
+	atomic_inc(&iow->io);
+	queue_work_on(map->data_worker, pvcalls_back_global.wq,
+		&iow->register_work);
 }
 
 static int pvcalls_back_connect(struct xenbus_device *dev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 16/18] xen/pvcalls: implement read
@ 2017-05-15 20:36     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

When an active socket has data available, add the relative sock_mapping
to the ioworker list, increment the io and read counters, and schedule
the ioworker.

Implement the read function by reading from the socket, writing the data
to the data ring.

Set in_error on error.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index db3e02c..0f715a8 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -102,6 +102,79 @@ static int pvcalls_back_release_active(struct xenbus_device *dev,
 
 static void pvcalls_conn_back_read(unsigned long opaque)
 {
+	struct sock_mapping *map = (struct sock_mapping *)opaque;
+	struct msghdr msg;
+	struct kvec vec[2];
+	RING_IDX cons, prod, size, wanted, array_size, masked_prod, masked_cons;
+	int32_t error;
+	struct pvcalls_data_intf *intf = map->ring;
+	struct pvcalls_data *data = &map->data;
+	int ret;
+
+	array_size = XEN_FLEX_RING_SIZE(map->ring_order);
+	cons = intf->in_cons;
+	prod = intf->in_prod;
+	error = intf->in_error;
+	/* read the indexes first, then deal with the data */
+	virt_mb();
+
+	if (error)
+		return;
+
+	size = pvcalls_queued(prod, cons, array_size);
+	if (size >= array_size)
+		return;
+	lock_sock(map->sock->sk);
+	if (skb_queue_empty(&map->sock->sk->sk_receive_queue)) {
+		atomic_set(&map->read, 0);
+		release_sock(map->sock->sk);
+		return;
+	}
+	release_sock(map->sock->sk);
+	wanted = array_size - size;
+	masked_prod = pvcalls_mask(prod, array_size);
+	masked_cons = pvcalls_mask(cons, array_size);
+
+	memset(&msg, 0, sizeof(msg));
+	msg.msg_iter.type = ITER_KVEC|WRITE;
+	msg.msg_iter.count = wanted;
+	if (masked_prod < masked_cons) {
+		vec[0].iov_base = data->in + masked_prod;
+		vec[0].iov_len = wanted;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 1;
+	} else {
+		vec[0].iov_base = data->in + masked_prod;
+		vec[0].iov_len = array_size - masked_prod;
+		vec[1].iov_base = data->in;
+		vec[1].iov_len = wanted - vec[0].iov_len;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 2;
+	}
+
+	atomic_set(&map->read, 0);
+	ret = inet_recvmsg(map->sock, &msg, wanted, MSG_DONTWAIT);
+	WARN_ON(ret > 0 && ret > wanted);
+	if (ret == -EAGAIN) /* shouldn't happen */
+		return;
+	if (!ret)
+		ret = -ENOTCONN;
+	lock_sock(map->sock->sk);
+	if (ret > 0 && !skb_queue_empty(&map->sock->sk->sk_receive_queue))
+		atomic_inc(&map->read);
+	release_sock(map->sock->sk);
+
+	/* write the data, then modify the indexes */
+	virt_wmb();
+	if (ret < 0)
+		intf->in_error = ret;
+	else
+		intf->in_prod = prod + ret;
+	/* update the indexes, then notify the other end */
+	virt_wmb();
+	notify_remote_via_irq(map->irq);
+
+	return;
 }
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
@@ -192,6 +265,22 @@ static void pvcalls_sk_state_change(struct sock *sock)
 
 static void pvcalls_sk_data_ready(struct sock *sock)
 {
+	struct sock_mapping *map = sock->sk_user_data;
+	struct pvcalls_ioworker *iow;
+	unsigned long flags;
+
+	if (map == NULL)
+		return;
+
+	iow = &pvcalls_back_global.ioworkers[map->data_worker];
+	spin_lock_irqsave(&iow->lock, flags);
+	atomic_inc(&map->read);
+	if (list_empty(&map->queue))
+		list_add_tail(&map->queue, &iow->wqs);
+	spin_unlock_irqrestore(&iow->lock, flags);
+	atomic_inc(&iow->io);
+	queue_work_on(map->data_worker, pvcalls_back_global.wq,
+		&iow->register_work);
 }
 
 static int pvcalls_back_connect(struct xenbus_device *dev,
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 17/18] xen/pvcalls: implement write
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (20 preceding siblings ...)
  2017-05-15 20:36   ` [PATCH 17/18] xen/pvcalls: implement write Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36     ` Stefano Stabellini
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

When the other end notifies us that there is data to be written
(pvcalls_back_conn_event), add the relative sock_mapping to the ioworker
list, increment the io and write counters, and schedule the ioworker.

Implement the write function called by ioworker by reading the data from
the data ring, writing it to the socket by calling inet_sendmsg.

Set out_error on error.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 80 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 0f715a8..2de43c3 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -179,7 +179,67 @@ static void pvcalls_conn_back_read(unsigned long opaque)
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
 {
-	return 0;
+	struct pvcalls_data_intf *intf = map->ring;
+	struct pvcalls_data *data = &map->data;
+	struct msghdr msg;
+	struct kvec vec[2];
+	RING_IDX cons, prod, size, ring_size;
+	int ret;
+
+	cons = intf->out_cons;
+	prod = intf->out_prod;
+	/* read the indexes before dealing with the data */
+	virt_mb();
+
+	ring_size = XEN_FLEX_RING_SIZE(map->ring_order);
+	size = pvcalls_queued(prod, cons, ring_size);
+	if (size == 0)
+		return 0;
+
+	memset(&msg, 0, sizeof(msg));
+	msg.msg_flags |= MSG_DONTWAIT;
+	msg.msg_iter.type = ITER_KVEC|READ;
+	msg.msg_iter.count = size;
+	if (pvcalls_mask(prod, ring_size) > pvcalls_mask(cons, ring_size)) {
+		vec[0].iov_base = data->out + pvcalls_mask(cons, ring_size);
+		vec[0].iov_len = size;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 1;
+	} else {
+		vec[0].iov_base = data->out + pvcalls_mask(cons, ring_size);
+		vec[0].iov_len = XEN_FLEX_RING_SIZE(ring_size) -
+			pvcalls_mask(cons, ring_size);
+		vec[1].iov_base = data->out;
+		vec[1].iov_len = size - vec[0].iov_len;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 2;
+	}
+
+	atomic_set(&map->write, 0);
+	ret = inet_sendmsg(map->sock, &msg, size);
+	if (ret == -EAGAIN || ret < size) {
+		atomic_inc(&map->write);
+		atomic_inc(&pvcalls_back_global.ioworkers[map->data_worker].io);
+	}
+	if (ret == -EAGAIN)
+		return ret;
+
+	/* write the data, then update the indexes */
+	virt_wmb();
+	if (ret < 0) {
+		intf->out_error = ret;
+	} else {
+		intf->out_error = 0;
+		intf->out_cons = cons + ret;
+		prod = intf->out_prod;
+	}
+	/* update the indexes, then notify the other end */
+	virt_wmb();
+	if (prod != cons + ret)
+		atomic_inc(&map->write);
+	notify_remote_via_irq(map->irq);
+
+	return ret;
 }
 
 static void pvcalls_back_ioworker(struct work_struct *work)
@@ -914,6 +974,24 @@ static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
 {
+	struct sock_mapping *map = sock_map;
+	struct pvcalls_ioworker *iow;
+	unsigned long flags;
+
+	if (map == NULL || map->sock == NULL || map->sock->sk == NULL ||
+		map->sock->sk->sk_user_data != map)
+		return IRQ_HANDLED;
+
+	iow = &pvcalls_back_global.ioworkers[map->data_worker];
+	spin_lock_irqsave(&iow->lock, flags);
+	atomic_inc(&map->write);
+	if (list_empty(&map->queue))
+		list_add_tail(&map->queue, &iow->wqs);
+	spin_unlock_irqrestore(&iow->lock, flags);
+	atomic_inc(&iow->io);
+	queue_work_on(map->data_worker, pvcalls_back_global.wq,
+		&iow->register_work);
+
 	return IRQ_HANDLED;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 17/18] xen/pvcalls: implement write
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
                     ` (19 preceding siblings ...)
  2017-05-15 20:36     ` Stefano Stabellini
@ 2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36   ` Stefano Stabellini
  2017-05-15 20:36     ` Stefano Stabellini
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

When the other end notifies us that there is data to be written
(pvcalls_back_conn_event), add the relative sock_mapping to the ioworker
list, increment the io and write counters, and schedule the ioworker.

Implement the write function called by ioworker by reading the data from
the data ring, writing it to the socket by calling inet_sendmsg.

Set out_error on error.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-back.c | 80 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 0f715a8..2de43c3 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -179,7 +179,67 @@ static void pvcalls_conn_back_read(unsigned long opaque)
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
 {
-	return 0;
+	struct pvcalls_data_intf *intf = map->ring;
+	struct pvcalls_data *data = &map->data;
+	struct msghdr msg;
+	struct kvec vec[2];
+	RING_IDX cons, prod, size, ring_size;
+	int ret;
+
+	cons = intf->out_cons;
+	prod = intf->out_prod;
+	/* read the indexes before dealing with the data */
+	virt_mb();
+
+	ring_size = XEN_FLEX_RING_SIZE(map->ring_order);
+	size = pvcalls_queued(prod, cons, ring_size);
+	if (size == 0)
+		return 0;
+
+	memset(&msg, 0, sizeof(msg));
+	msg.msg_flags |= MSG_DONTWAIT;
+	msg.msg_iter.type = ITER_KVEC|READ;
+	msg.msg_iter.count = size;
+	if (pvcalls_mask(prod, ring_size) > pvcalls_mask(cons, ring_size)) {
+		vec[0].iov_base = data->out + pvcalls_mask(cons, ring_size);
+		vec[0].iov_len = size;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 1;
+	} else {
+		vec[0].iov_base = data->out + pvcalls_mask(cons, ring_size);
+		vec[0].iov_len = XEN_FLEX_RING_SIZE(ring_size) -
+			pvcalls_mask(cons, ring_size);
+		vec[1].iov_base = data->out;
+		vec[1].iov_len = size - vec[0].iov_len;
+		msg.msg_iter.kvec = vec;
+		msg.msg_iter.nr_segs = 2;
+	}
+
+	atomic_set(&map->write, 0);
+	ret = inet_sendmsg(map->sock, &msg, size);
+	if (ret == -EAGAIN || ret < size) {
+		atomic_inc(&map->write);
+		atomic_inc(&pvcalls_back_global.ioworkers[map->data_worker].io);
+	}
+	if (ret == -EAGAIN)
+		return ret;
+
+	/* write the data, then update the indexes */
+	virt_wmb();
+	if (ret < 0) {
+		intf->out_error = ret;
+	} else {
+		intf->out_error = 0;
+		intf->out_cons = cons + ret;
+		prod = intf->out_prod;
+	}
+	/* update the indexes, then notify the other end */
+	virt_wmb();
+	if (prod != cons + ret)
+		atomic_inc(&map->write);
+	notify_remote_via_irq(map->irq);
+
+	return ret;
 }
 
 static void pvcalls_back_ioworker(struct work_struct *work)
@@ -914,6 +974,24 @@ static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
 {
+	struct sock_mapping *map = sock_map;
+	struct pvcalls_ioworker *iow;
+	unsigned long flags;
+
+	if (map == NULL || map->sock == NULL || map->sock->sk == NULL ||
+		map->sock->sk->sk_user_data != map)
+		return IRQ_HANDLED;
+
+	iow = &pvcalls_back_global.ioworkers[map->data_worker];
+	spin_lock_irqsave(&iow->lock, flags);
+	atomic_inc(&map->write);
+	if (list_empty(&map->queue))
+		list_add_tail(&map->queue, &iow->wqs);
+	spin_unlock_irqrestore(&iow->lock, flags);
+	atomic_inc(&iow->io);
+	queue_work_on(map->data_worker, pvcalls_back_global.wq,
+		&iow->register_work);
+
 	return IRQ_HANDLED;
 }
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 18/18] xen: introduce a Kconfig option to enable the pvcalls backend
  2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
@ 2017-05-15 20:36     ` Stefano Stabellini
  2017-05-15 20:35   ` Stefano Stabellini
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Also add pvcalls-back to the Makefile.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/Kconfig  | 12 ++++++++++++
 drivers/xen/Makefile |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f15bb3b7..bbdf059 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,18 @@ config XEN_PCIDEV_BACKEND
 
 	  If in doubt, say m.
 
+config XEN_PVCALLS_BACKEND
+	bool "XEN PV Calls backend driver"
+	depends on INET && XEN
+	default n
+	help
+	  Experimental backend for the Xen PV Calls protocol
+	  (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+	  allows PV Calls frontends to send POSIX calls to the backend,
+	  which implements them.
+
+	  If in doubt, say n.
+
 config XEN_SCSI_BACKEND
 	tristate "XEN SCSI backend driver"
 	depends on XEN && XEN_BACKEND && TARGET_CORE
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 8feab810..480b928 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_XEN_ACPI_PROCESSOR)	+= xen-acpi-processor.o
 obj-$(CONFIG_XEN_EFI)			+= efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND)		+= xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)		+= xlate_mmu.o
+obj-$(CONFIG_XEN_PVCALLS_BACKEND)	+= pvcalls-back.o
 xen-evtchn-y				:= evtchn.o
 xen-gntdev-y				:= gntdev.o
 xen-gntalloc-y				:= gntalloc.o
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 18/18] xen: introduce a Kconfig option to enable the pvcalls backend
@ 2017-05-15 20:36     ` Stefano Stabellini
  0 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-15 20:36 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Also add pvcalls-back to the Makefile.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/Kconfig  | 12 ++++++++++++
 drivers/xen/Makefile |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f15bb3b7..bbdf059 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,18 @@ config XEN_PCIDEV_BACKEND
 
 	  If in doubt, say m.
 
+config XEN_PVCALLS_BACKEND
+	bool "XEN PV Calls backend driver"
+	depends on INET && XEN
+	default n
+	help
+	  Experimental backend for the Xen PV Calls protocol
+	  (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+	  allows PV Calls frontends to send POSIX calls to the backend,
+	  which implements them.
+
+	  If in doubt, say n.
+
 config XEN_SCSI_BACKEND
 	tristate "XEN SCSI backend driver"
 	depends on XEN && XEN_BACKEND && TARGET_CORE
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 8feab810..480b928 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_XEN_ACPI_PROCESSOR)	+= xen-acpi-processor.o
 obj-$(CONFIG_XEN_EFI)			+= efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND)		+= xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)		+= xlate_mmu.o
+obj-$(CONFIG_XEN_PVCALLS_BACKEND)	+= pvcalls-back.o
 xen-evtchn-y				:= evtchn.o
 xen-gntdev-y				:= gntdev.o
 xen-gntalloc-y				:= gntalloc.o
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
@ 2017-05-16  1:28     ` Boris Ostrovsky
  2017-05-16 20:05       ` Stefano Stabellini
  2017-05-16 20:05       ` Stefano Stabellini
  -1 siblings, 2 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  1:28 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> The pvcalls backend has one ioworker per cpu: the ioworkers are
> implemented as a cpu bound workqueue, and will deal with the actual
> socket and data ring reads/writes.
>
> ioworkers are global: we only have one set for all the frontends. They
> process requests on their wqs list in order, once they are done with a
> request, they'll remove it from the list. A spinlock is used for
> protecting the list. Each ioworker is bound to a different cpu to
> maximize throughput.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2dbf7d8..46a889a 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -25,6 +25,26 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>
> +struct pvcalls_ioworker {
> +	struct work_struct register_work;
> +	atomic_t io;
> +	struct list_head wqs;
> +	spinlock_t lock;
> +	int num;
> +};
> +
> +struct pvcalls_back_global {
> +	struct pvcalls_ioworker *ioworkers;
> +	int nr_ioworkers;
> +	struct workqueue_struct *wq;
> +	struct list_head privs;
> +	struct rw_semaphore privs_lock;

Is there a reason why these are called "privs"?

And why are you using a rw semaphore --- I only noticed two instances of 
use and both are writes.


> +} pvcalls_back_global;
> +
> +static void pvcalls_back_ioworker(struct work_struct *work)
> +{
> +}
> +
>  static int pvcalls_back_probe(struct xenbus_device *dev,
>  			      const struct xenbus_device_id *id)
>  {
> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>  	.uevent = pvcalls_back_uevent,
>  	.otherend_changed = pvcalls_back_changed,
>  };
> +
> +static int __init pvcalls_back_init(void)
> +{
> +	int ret, i, cpu;
> +
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> +	if (ret < 0)
> +		return ret;
> +
> +	init_rwsem(&pvcalls_back_global.privs_lock);
> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> +	if (!pvcalls_back_global.wq)
> +		goto error;
> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();


Should nr_ioworkers be updated on CPU hot(un)plug?


> +	pvcalls_back_global.ioworkers = kzalloc(
> +		sizeof(*pvcalls_back_global.ioworkers) *
> +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
> +	if (!pvcalls_back_global.ioworkers)
> +		goto error;
> +	i = 0;
> +	for_each_online_cpu(cpu) {
> +		pvcalls_back_global.ioworkers[i].num = i;
> +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> +			pvcalls_back_ioworker);
> +		i++;
> +	}
> +	return 0;
> +
> +error:
> +	if (pvcalls_back_global.wq)
> +		destroy_workqueue(pvcalls_back_global.wq);
> +	xenbus_unregister_driver(&pvcalls_back_driver);
> +	kfree(pvcalls_back_global.ioworkers);
> +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> +	return -ENOMEM;

This routine could use more newlines. (and in other patches too)

-boris

> +}
> +module_init(pvcalls_back_init);
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
  (?)
@ 2017-05-16  1:28     ` Boris Ostrovsky
  -1 siblings, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  1:28 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> The pvcalls backend has one ioworker per cpu: the ioworkers are
> implemented as a cpu bound workqueue, and will deal with the actual
> socket and data ring reads/writes.
>
> ioworkers are global: we only have one set for all the frontends. They
> process requests on their wqs list in order, once they are done with a
> request, they'll remove it from the list. A spinlock is used for
> protecting the list. Each ioworker is bound to a different cpu to
> maximize throughput.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2dbf7d8..46a889a 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -25,6 +25,26 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>
> +struct pvcalls_ioworker {
> +	struct work_struct register_work;
> +	atomic_t io;
> +	struct list_head wqs;
> +	spinlock_t lock;
> +	int num;
> +};
> +
> +struct pvcalls_back_global {
> +	struct pvcalls_ioworker *ioworkers;
> +	int nr_ioworkers;
> +	struct workqueue_struct *wq;
> +	struct list_head privs;
> +	struct rw_semaphore privs_lock;

Is there a reason why these are called "privs"?

And why are you using a rw semaphore --- I only noticed two instances of 
use and both are writes.


> +} pvcalls_back_global;
> +
> +static void pvcalls_back_ioworker(struct work_struct *work)
> +{
> +}
> +
>  static int pvcalls_back_probe(struct xenbus_device *dev,
>  			      const struct xenbus_device_id *id)
>  {
> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>  	.uevent = pvcalls_back_uevent,
>  	.otherend_changed = pvcalls_back_changed,
>  };
> +
> +static int __init pvcalls_back_init(void)
> +{
> +	int ret, i, cpu;
> +
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> +	if (ret < 0)
> +		return ret;
> +
> +	init_rwsem(&pvcalls_back_global.privs_lock);
> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> +	if (!pvcalls_back_global.wq)
> +		goto error;
> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();


Should nr_ioworkers be updated on CPU hot(un)plug?


> +	pvcalls_back_global.ioworkers = kzalloc(
> +		sizeof(*pvcalls_back_global.ioworkers) *
> +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
> +	if (!pvcalls_back_global.ioworkers)
> +		goto error;
> +	i = 0;
> +	for_each_online_cpu(cpu) {
> +		pvcalls_back_global.ioworkers[i].num = i;
> +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> +			pvcalls_back_ioworker);
> +		i++;
> +	}
> +	return 0;
> +
> +error:
> +	if (pvcalls_back_global.wq)
> +		destroy_workqueue(pvcalls_back_global.wq);
> +	xenbus_unregister_driver(&pvcalls_back_driver);
> +	kfree(pvcalls_back_global.ioworkers);
> +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> +	return -ENOMEM;

This routine could use more newlines. (and in other patches too)

-boris

> +}
> +module_init(pvcalls_back_init);
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/18] xen/pvcalls: xenbus state handling
  2017-05-15 20:35     ` Stefano Stabellini
@ 2017-05-16  1:34       ` Boris Ostrovsky
  -1 siblings, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  1:34 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> Introduce the code to handle xenbus state changes.
>
> Implement the probe function for the pvcalls backend. Write the
> supported versions, max-page-order and function-calls nodes to xenstore,
> as required by the protocol.
>
> Introduce stub functions for disconnecting/connecting to a frontend.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 133 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 133 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 46a889a..86eca19 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -25,6 +25,9 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>
> +#define PVCALLS_VERSIONS "1"
> +#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> +
>  struct pvcalls_ioworker {
>  	struct work_struct register_work;
>  	atomic_t io;
> @@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
>
> +static int backend_connect(struct xenbus_device *dev)
> +{
> +	return 0;
> +}
> +
> +static int backend_disconnect(struct xenbus_device *dev)
> +{
> +	return 0;
> +}
> +
>  static int pvcalls_back_probe(struct xenbus_device *dev,
>  			      const struct xenbus_device_id *id)
>  {
> +	int err;
> +
> +	err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
> +			    PVCALLS_VERSIONS);
> +	if (err) {
> +		pr_warn("%s write out 'version' failed\n", __func__);
> +		return -EINVAL;

Why not return err? (below too)


> +	}
> +
> +	err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
> +			    MAX_RING_ORDER);
> +	if (err) {
> +		pr_warn("%s write out 'max-page-order' failed\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	/* "1" means socket, connect, release, bind, listen, accept and poll*/
> +	err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");


Should "1" be defined in the (public) header file?


> +	if (err) {
> +		pr_warn("%s write out 'function-calls' failed\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	err = xenbus_switch_state(dev, XenbusStateInitWait);
> +	if (err)
> +		return err;
> +
>  	return 0;
>  }
>
> +static void set_backend_state(struct xenbus_device *dev,
> +			      enum xenbus_state state)
> +{
> +	while (dev->state != state) {
> +		switch (dev->state) {
> +		case XenbusStateClosed:
> +			switch (state) {
> +			case XenbusStateInitWait:
> +			case XenbusStateConnected:
> +				xenbus_switch_state(dev, XenbusStateInitWait);
> +				break;
> +			case XenbusStateClosing:
> +				xenbus_switch_state(dev, XenbusStateClosing);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		case XenbusStateInitWait:
> +		case XenbusStateInitialised:
> +			switch (state) {
> +			case XenbusStateConnected:
> +				backend_connect(dev);
> +				xenbus_switch_state(dev, XenbusStateConnected);
> +				break;
> +			case XenbusStateClosing:
> +			case XenbusStateClosed:
> +				xenbus_switch_state(dev, XenbusStateClosing);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		case XenbusStateConnected:
> +			switch (state) {
> +			case XenbusStateInitWait:
> +			case XenbusStateClosing:
> +			case XenbusStateClosed:
> +				down_write(&pvcalls_back_global.privs_lock);
> +				backend_disconnect(dev);
> +				up_write(&pvcalls_back_global.privs_lock);


Unless you plan to have more stuff under the semaphore, I'd consider 
putting them in backend_disconnect().


> +				xenbus_switch_state(dev, XenbusStateClosing);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		case XenbusStateClosing:
> +			switch (state) {
> +			case XenbusStateInitWait:
> +			case XenbusStateConnected:
> +			case XenbusStateClosed:
> +				xenbus_switch_state(dev, XenbusStateClosed);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		default:
> +			__WARN();
> +		}
> +	}
> +}
> +
>  static void pvcalls_back_changed(struct xenbus_device *dev,
>  				 enum xenbus_state frontend_state)
>  {
> +	switch (frontend_state) {
> +	case XenbusStateInitialising:
> +		set_backend_state(dev, XenbusStateInitWait);
> +		break;
> +
> +	case XenbusStateInitialised:
> +	case XenbusStateConnected:
> +		set_backend_state(dev, XenbusStateConnected);
> +		break;
> +
> +	case XenbusStateClosing:
> +		set_backend_state(dev, XenbusStateClosing);
> +		break;
> +
> +	case XenbusStateClosed:
> +		set_backend_state(dev, XenbusStateClosed);
> +		if (xenbus_dev_is_online(dev))
> +			break;
> +		/* fall through if not online */
> +	case XenbusStateUnknown:
> +		set_backend_state(dev, XenbusStateClosed);


You are setting XenbusStateClosed twice in case of fallthrough.

-boris


> +		device_unregister(&dev->dev);
> +		break;
> +
> +	default:
> +		xenbus_dev_fatal(dev, -EINVAL, "saw state %d at frontend",
> +				 frontend_state);
> +		break;
> +	}
>  }
>
>  static int pvcalls_back_remove(struct xenbus_device *dev)
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/18] xen/pvcalls: xenbus state handling
@ 2017-05-16  1:34       ` Boris Ostrovsky
  0 siblings, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  1:34 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> Introduce the code to handle xenbus state changes.
>
> Implement the probe function for the pvcalls backend. Write the
> supported versions, max-page-order and function-calls nodes to xenstore,
> as required by the protocol.
>
> Introduce stub functions for disconnecting/connecting to a frontend.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 133 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 133 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 46a889a..86eca19 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -25,6 +25,9 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>
> +#define PVCALLS_VERSIONS "1"
> +#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> +
>  struct pvcalls_ioworker {
>  	struct work_struct register_work;
>  	atomic_t io;
> @@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
>
> +static int backend_connect(struct xenbus_device *dev)
> +{
> +	return 0;
> +}
> +
> +static int backend_disconnect(struct xenbus_device *dev)
> +{
> +	return 0;
> +}
> +
>  static int pvcalls_back_probe(struct xenbus_device *dev,
>  			      const struct xenbus_device_id *id)
>  {
> +	int err;
> +
> +	err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
> +			    PVCALLS_VERSIONS);
> +	if (err) {
> +		pr_warn("%s write out 'version' failed\n", __func__);
> +		return -EINVAL;

Why not return err? (below too)


> +	}
> +
> +	err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
> +			    MAX_RING_ORDER);
> +	if (err) {
> +		pr_warn("%s write out 'max-page-order' failed\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	/* "1" means socket, connect, release, bind, listen, accept and poll*/
> +	err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");


Should "1" be defined in the (public) header file?


> +	if (err) {
> +		pr_warn("%s write out 'function-calls' failed\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	err = xenbus_switch_state(dev, XenbusStateInitWait);
> +	if (err)
> +		return err;
> +
>  	return 0;
>  }
>
> +static void set_backend_state(struct xenbus_device *dev,
> +			      enum xenbus_state state)
> +{
> +	while (dev->state != state) {
> +		switch (dev->state) {
> +		case XenbusStateClosed:
> +			switch (state) {
> +			case XenbusStateInitWait:
> +			case XenbusStateConnected:
> +				xenbus_switch_state(dev, XenbusStateInitWait);
> +				break;
> +			case XenbusStateClosing:
> +				xenbus_switch_state(dev, XenbusStateClosing);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		case XenbusStateInitWait:
> +		case XenbusStateInitialised:
> +			switch (state) {
> +			case XenbusStateConnected:
> +				backend_connect(dev);
> +				xenbus_switch_state(dev, XenbusStateConnected);
> +				break;
> +			case XenbusStateClosing:
> +			case XenbusStateClosed:
> +				xenbus_switch_state(dev, XenbusStateClosing);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		case XenbusStateConnected:
> +			switch (state) {
> +			case XenbusStateInitWait:
> +			case XenbusStateClosing:
> +			case XenbusStateClosed:
> +				down_write(&pvcalls_back_global.privs_lock);
> +				backend_disconnect(dev);
> +				up_write(&pvcalls_back_global.privs_lock);


Unless you plan to have more stuff under the semaphore, I'd consider 
putting them in backend_disconnect().


> +				xenbus_switch_state(dev, XenbusStateClosing);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		case XenbusStateClosing:
> +			switch (state) {
> +			case XenbusStateInitWait:
> +			case XenbusStateConnected:
> +			case XenbusStateClosed:
> +				xenbus_switch_state(dev, XenbusStateClosed);
> +				break;
> +			default:
> +				__WARN();
> +			}
> +			break;
> +		default:
> +			__WARN();
> +		}
> +	}
> +}
> +
>  static void pvcalls_back_changed(struct xenbus_device *dev,
>  				 enum xenbus_state frontend_state)
>  {
> +	switch (frontend_state) {
> +	case XenbusStateInitialising:
> +		set_backend_state(dev, XenbusStateInitWait);
> +		break;
> +
> +	case XenbusStateInitialised:
> +	case XenbusStateConnected:
> +		set_backend_state(dev, XenbusStateConnected);
> +		break;
> +
> +	case XenbusStateClosing:
> +		set_backend_state(dev, XenbusStateClosing);
> +		break;
> +
> +	case XenbusStateClosed:
> +		set_backend_state(dev, XenbusStateClosed);
> +		if (xenbus_dev_is_online(dev))
> +			break;
> +		/* fall through if not online */
> +	case XenbusStateUnknown:
> +		set_backend_state(dev, XenbusStateClosed);


You are setting XenbusStateClosed twice in case of fallthrough.

-boris


> +		device_unregister(&dev->dev);
> +		break;
> +
> +	default:
> +		xenbus_dev_fatal(dev, -EINVAL, "saw state %d at frontend",
> +				 frontend_state);
> +		break;
> +	}
>  }
>
>  static int pvcalls_back_remove(struct xenbus_device *dev)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/18] xen/pvcalls: connect to a frontend
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
  (?)
@ 2017-05-16  1:52     ` Boris Ostrovsky
  2017-05-16 20:23       ` Stefano Stabellini
  2017-05-16 20:23       ` Stefano Stabellini
  -1 siblings, 2 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  1:52 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> Introduce a per-frontend data structure named pvcalls_back_priv. It
> contains pointers to the command ring, its event channel, a list of
> active sockets and a tree of passive sockets (passing sockets need to be
> looked up from the id on listen, accept and poll commands, while active
> sockets only on release).

It would be useful to put this into a comment in pvcalls_back_priv 
definition.

>
> It also has an unbound workqueue to schedule the work of parsing and
> executing commands on the command ring. pvcallss_lock protects the two
> lists. In pvcalls_back_global, keep a list of connected frontends.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 86eca19..876e577 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -44,13 +44,100 @@ struct pvcalls_back_global {
>  	struct rw_semaphore privs_lock;
>  } pvcalls_back_global;
>
> +struct pvcalls_back_priv {
> +	struct list_head list;
> +	struct xenbus_device *dev;
> +	struct xen_pvcalls_sring *sring;
> +	struct xen_pvcalls_back_ring ring;
> +	int irq;
> +	struct list_head socket_mappings;
> +	struct radix_tree_root socketpass_mappings;
> +	struct rw_semaphore pvcallss_lock;

Same question as before regarding using rw semaphore --- I only see 
down/up_writes.

And what does the name (pvcallss) stand for?


> +	atomic_t work;
> +	struct workqueue_struct *wq;
> +	struct work_struct register_work;
> +};
> +
>  static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
>
> +static void pvcalls_back_work(struct work_struct *work)
> +{
> +}
> +
> +static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
> +{
> +	return IRQ_HANDLED;
> +}
> +
>  static int backend_connect(struct xenbus_device *dev)
>  {
> +	int err, evtchn;
> +	grant_ref_t ring_ref;
> +	void *addr = NULL;
> +	struct pvcalls_back_priv *priv = NULL;
> +
> +	priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
> +			   &evtchn);
> +	if (err != 1) {
> +		err = -EINVAL;
> +		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
> +				 dev->otherend);
> +		goto error;
> +	}
> +
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", &ring_ref);
> +	if (err != 1) {
> +		err = -EINVAL;
> +		xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
> +				 dev->otherend);
> +		goto error;
> +	}
> +
> +	err = xenbus_map_ring_valloc(dev, &ring_ref, 1, &addr);
> +	if (err < 0)
> +		goto error;


I'd move this closer to first use, below.

-boris

> +
> +	err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
> +						    pvcalls_back_event, 0,
> +						    "pvcalls-backend", dev);
> +	if (err < 0)
> +		goto error;
> +
> +	priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
> +	if (!priv->wq) {
> +		err = -ENOMEM;
> +		goto error;
> +	}
> +	INIT_WORK(&priv->register_work, pvcalls_back_work);
> +	priv->dev = dev;
> +	priv->sring = addr;
> +	BACK_RING_INIT(&priv->ring, priv->sring, XEN_PAGE_SIZE * 1);
> +	priv->irq = err;
> +	INIT_LIST_HEAD(&priv->socket_mappings);
> +	INIT_RADIX_TREE(&priv->socketpass_mappings, GFP_KERNEL);
> +	init_rwsem(&priv->pvcallss_lock);
> +	dev_set_drvdata(&dev->dev, priv);
> +	down_write(&pvcalls_back_global.privs_lock);
> +	list_add_tail(&priv->list, &pvcalls_back_global.privs);
> +	up_write(&pvcalls_back_global.privs_lock);
> +	queue_work(priv->wq, &priv->register_work);
> +
>  	return 0;
> +
> + error:
> +	if (addr != NULL)
> +		xenbus_unmap_ring_vfree(dev, addr);
> +	if (priv->wq)
> +		destroy_workqueue(priv->wq);
> +	unbind_from_irqhandler(priv->irq, dev);
> +	kfree(priv);
> +	return err;
>  }
>
>  static int backend_disconnect(struct xenbus_device *dev)
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/18] xen/pvcalls: connect to a frontend
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
@ 2017-05-16  1:52     ` Boris Ostrovsky
  -1 siblings, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  1:52 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> Introduce a per-frontend data structure named pvcalls_back_priv. It
> contains pointers to the command ring, its event channel, a list of
> active sockets and a tree of passive sockets (passing sockets need to be
> looked up from the id on listen, accept and poll commands, while active
> sockets only on release).

It would be useful to put this into a comment in pvcalls_back_priv 
definition.

>
> It also has an unbound workqueue to schedule the work of parsing and
> executing commands on the command ring. pvcallss_lock protects the two
> lists. In pvcalls_back_global, keep a list of connected frontends.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 86eca19..876e577 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -44,13 +44,100 @@ struct pvcalls_back_global {
>  	struct rw_semaphore privs_lock;
>  } pvcalls_back_global;
>
> +struct pvcalls_back_priv {
> +	struct list_head list;
> +	struct xenbus_device *dev;
> +	struct xen_pvcalls_sring *sring;
> +	struct xen_pvcalls_back_ring ring;
> +	int irq;
> +	struct list_head socket_mappings;
> +	struct radix_tree_root socketpass_mappings;
> +	struct rw_semaphore pvcallss_lock;

Same question as before regarding using rw semaphore --- I only see 
down/up_writes.

And what does the name (pvcallss) stand for?


> +	atomic_t work;
> +	struct workqueue_struct *wq;
> +	struct work_struct register_work;
> +};
> +
>  static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
>
> +static void pvcalls_back_work(struct work_struct *work)
> +{
> +}
> +
> +static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
> +{
> +	return IRQ_HANDLED;
> +}
> +
>  static int backend_connect(struct xenbus_device *dev)
>  {
> +	int err, evtchn;
> +	grant_ref_t ring_ref;
> +	void *addr = NULL;
> +	struct pvcalls_back_priv *priv = NULL;
> +
> +	priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
> +			   &evtchn);
> +	if (err != 1) {
> +		err = -EINVAL;
> +		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
> +				 dev->otherend);
> +		goto error;
> +	}
> +
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", &ring_ref);
> +	if (err != 1) {
> +		err = -EINVAL;
> +		xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
> +				 dev->otherend);
> +		goto error;
> +	}
> +
> +	err = xenbus_map_ring_valloc(dev, &ring_ref, 1, &addr);
> +	if (err < 0)
> +		goto error;


I'd move this closer to first use, below.

-boris

> +
> +	err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
> +						    pvcalls_back_event, 0,
> +						    "pvcalls-backend", dev);
> +	if (err < 0)
> +		goto error;
> +
> +	priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
> +	if (!priv->wq) {
> +		err = -ENOMEM;
> +		goto error;
> +	}
> +	INIT_WORK(&priv->register_work, pvcalls_back_work);
> +	priv->dev = dev;
> +	priv->sring = addr;
> +	BACK_RING_INIT(&priv->ring, priv->sring, XEN_PAGE_SIZE * 1);
> +	priv->irq = err;
> +	INIT_LIST_HEAD(&priv->socket_mappings);
> +	INIT_RADIX_TREE(&priv->socketpass_mappings, GFP_KERNEL);
> +	init_rwsem(&priv->pvcallss_lock);
> +	dev_set_drvdata(&dev->dev, priv);
> +	down_write(&pvcalls_back_global.privs_lock);
> +	list_add_tail(&priv->list, &pvcalls_back_global.privs);
> +	up_write(&pvcalls_back_global.privs_lock);
> +	queue_work(priv->wq, &priv->register_work);
> +
>  	return 0;
> +
> + error:
> +	if (addr != NULL)
> +		xenbus_unmap_ring_vfree(dev, addr);
> +	if (priv->wq)
> +		destroy_workqueue(priv->wq);
> +	unbind_from_irqhandler(priv->irq, dev);
> +	kfree(priv);
> +	return err;
>  }
>
>  static int backend_disconnect(struct xenbus_device *dev)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/18] xen/pvcalls: handle commands from the frontend
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
@ 2017-05-16  2:06     ` Boris Ostrovsky
  2017-05-16 20:57       ` Stefano Stabellini
  2017-05-16 20:57       ` Stefano Stabellini
  -1 siblings, 2 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  2:06 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> When the other end notifies us that there are commands to be read
> (pvcalls_back_event), wake up the backend thread to parse the command.
>
> The command ring works like most other Xen rings, so use the usual
> ring macros to read and write to it. The functions implementing the
> commands are empty stubs for now.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 115 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 115 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 876e577..2b2a49a 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
>
> +static int pvcalls_back_socket(struct xenbus_device *dev,
> +		struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_connect(struct xenbus_device *dev,
> +				struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_release(struct xenbus_device *dev,
> +				struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_bind(struct xenbus_device *dev,
> +			     struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_listen(struct xenbus_device *dev,
> +			       struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_accept(struct xenbus_device *dev,
> +			       struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_poll(struct xenbus_device *dev,
> +			     struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
> +				   struct xen_pvcalls_request *req)
> +{
> +	int ret = 0;
> +
> +	switch (req->cmd) {
> +	case PVCALLS_SOCKET:
> +		ret = pvcalls_back_socket(dev, req);
> +		break;
> +	case PVCALLS_CONNECT:
> +		ret = pvcalls_back_connect(dev, req);
> +		break;
> +	case PVCALLS_RELEASE:
> +		ret = pvcalls_back_release(dev, req);
> +		break;
> +	case PVCALLS_BIND:
> +		ret = pvcalls_back_bind(dev, req);
> +		break;
> +	case PVCALLS_LISTEN:
> +		ret = pvcalls_back_listen(dev, req);
> +		break;
> +	case PVCALLS_ACCEPT:
> +		ret = pvcalls_back_accept(dev, req);
> +		break;
> +	case PVCALLS_POLL:
> +		ret = pvcalls_back_poll(dev, req);
> +		break;
> +	default:
> +		ret = -ENOTSUPP;
> +		break;
> +	}
> +	return ret;
> +}
> +
>  static void pvcalls_back_work(struct work_struct *work)
>  {
> +	struct pvcalls_back_priv *priv = container_of(work,
> +		struct pvcalls_back_priv, register_work);
> +	int notify, notify_all = 0, more = 1;
> +	struct xen_pvcalls_request req;
> +	struct xenbus_device *dev = priv->dev;
> +
> +	atomic_set(&priv->work, 1);
> +
> +	while (more || !atomic_dec_and_test(&priv->work)) {
> +		while (RING_HAS_UNCONSUMED_REQUESTS(&priv->ring)) {
> +			RING_COPY_REQUEST(&priv->ring,
> +					  priv->ring.req_cons++,
> +					  &req);
> +
> +			if (pvcalls_back_handle_cmd(dev, &req) > 0) {

Can you make handlers make "traditional" returns, i.e. <0 on error and 0 
on success? Or do you really need to distinguish 0 from >0?

> +				RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
> +					&priv->ring, notify);
> +				notify_all += notify;
> +			}
> +		}
> +
> +		if (notify_all)
> +			notify_remote_via_irq(priv->irq);
> +
> +		RING_FINAL_CHECK_FOR_REQUESTS(&priv->ring, more);
> +	}
>  }
>
>  static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
>  {
> +	struct xenbus_device *dev = dev_id;
> +	struct pvcalls_back_priv *priv = NULL;
> +
> +	if (dev == NULL)
> +		return IRQ_HANDLED;
> +
> +	priv = dev_get_drvdata(&dev->dev);
> +	if (priv == NULL)
> +		return IRQ_HANDLED;

These two aren't errors?

> +
> +	atomic_inc(&priv->work);

Is this really needed? We have a new entry on the ring, so the outer 
loop in pvcalls_back_work() will pick this up (by setting 'more').


-boris

> +	queue_work(priv->wq, &priv->register_work);
> +
>  	return IRQ_HANDLED;
>  }
>
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/18] xen/pvcalls: handle commands from the frontend
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
  (?)
@ 2017-05-16  2:06     ` Boris Ostrovsky
  -1 siblings, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  2:06 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> When the other end notifies us that there are commands to be read
> (pvcalls_back_event), wake up the backend thread to parse the command.
>
> The command ring works like most other Xen rings, so use the usual
> ring macros to read and write to it. The functions implementing the
> commands are empty stubs for now.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 115 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 115 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 876e577..2b2a49a 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
>
> +static int pvcalls_back_socket(struct xenbus_device *dev,
> +		struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_connect(struct xenbus_device *dev,
> +				struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_release(struct xenbus_device *dev,
> +				struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_bind(struct xenbus_device *dev,
> +			     struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_listen(struct xenbus_device *dev,
> +			       struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_accept(struct xenbus_device *dev,
> +			       struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_poll(struct xenbus_device *dev,
> +			     struct xen_pvcalls_request *req)
> +{
> +	return 0;
> +}
> +
> +static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
> +				   struct xen_pvcalls_request *req)
> +{
> +	int ret = 0;
> +
> +	switch (req->cmd) {
> +	case PVCALLS_SOCKET:
> +		ret = pvcalls_back_socket(dev, req);
> +		break;
> +	case PVCALLS_CONNECT:
> +		ret = pvcalls_back_connect(dev, req);
> +		break;
> +	case PVCALLS_RELEASE:
> +		ret = pvcalls_back_release(dev, req);
> +		break;
> +	case PVCALLS_BIND:
> +		ret = pvcalls_back_bind(dev, req);
> +		break;
> +	case PVCALLS_LISTEN:
> +		ret = pvcalls_back_listen(dev, req);
> +		break;
> +	case PVCALLS_ACCEPT:
> +		ret = pvcalls_back_accept(dev, req);
> +		break;
> +	case PVCALLS_POLL:
> +		ret = pvcalls_back_poll(dev, req);
> +		break;
> +	default:
> +		ret = -ENOTSUPP;
> +		break;
> +	}
> +	return ret;
> +}
> +
>  static void pvcalls_back_work(struct work_struct *work)
>  {
> +	struct pvcalls_back_priv *priv = container_of(work,
> +		struct pvcalls_back_priv, register_work);
> +	int notify, notify_all = 0, more = 1;
> +	struct xen_pvcalls_request req;
> +	struct xenbus_device *dev = priv->dev;
> +
> +	atomic_set(&priv->work, 1);
> +
> +	while (more || !atomic_dec_and_test(&priv->work)) {
> +		while (RING_HAS_UNCONSUMED_REQUESTS(&priv->ring)) {
> +			RING_COPY_REQUEST(&priv->ring,
> +					  priv->ring.req_cons++,
> +					  &req);
> +
> +			if (pvcalls_back_handle_cmd(dev, &req) > 0) {

Can you make handlers make "traditional" returns, i.e. <0 on error and 0 
on success? Or do you really need to distinguish 0 from >0?

> +				RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
> +					&priv->ring, notify);
> +				notify_all += notify;
> +			}
> +		}
> +
> +		if (notify_all)
> +			notify_remote_via_irq(priv->irq);
> +
> +		RING_FINAL_CHECK_FOR_REQUESTS(&priv->ring, more);
> +	}
>  }
>
>  static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
>  {
> +	struct xenbus_device *dev = dev_id;
> +	struct pvcalls_back_priv *priv = NULL;
> +
> +	if (dev == NULL)
> +		return IRQ_HANDLED;
> +
> +	priv = dev_get_drvdata(&dev->dev);
> +	if (priv == NULL)
> +		return IRQ_HANDLED;

These two aren't errors?

> +
> +	atomic_inc(&priv->work);

Is this really needed? We have a new entry on the ring, so the outer 
loop in pvcalls_back_work() will pick this up (by setting 'more').


-boris

> +	queue_work(priv->wq, &priv->register_work);
> +
>  	return IRQ_HANDLED;
>  }
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 07/18] xen/pvcalls: implement socket command
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
@ 2017-05-16  2:12     ` Boris Ostrovsky
  2017-05-16 20:45       ` Stefano Stabellini
  2017-05-16 20:45       ` Stefano Stabellini
  -1 siblings, 2 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  2:12 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> Just reply with success to the other end for now. Delay the allocation
> of the actual socket to bind and/or connect.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 31 ++++++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2b2a49a..2eae096 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -12,12 +12,17 @@
>   * GNU General Public License for more details.
>   */
>
> +#include <linux/inet.h>
>  #include <linux/kthread.h>
>  #include <linux/list.h>
>  #include <linux/radix-tree.h>
>  #include <linux/module.h>
>  #include <linux/rwsem.h>
>  #include <linux/wait.h>
> +#include <net/sock.h>
> +#include <net/inet_common.h>
> +#include <net/inet_connection_sock.h>
> +#include <net/request_sock.h>
>
>  #include <xen/events.h>
>  #include <xen/grant_table.h>
> @@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct *work)
>  static int pvcalls_back_socket(struct xenbus_device *dev,
>  		struct xen_pvcalls_request *req)
>  {
> -	return 0;
> +	struct pvcalls_back_priv *priv;
> +	int ret;
> +	struct xen_pvcalls_response *rsp;
> +
> +	if (dev == NULL)
> +		return 0;
> +	priv = dev_get_drvdata(&dev->dev);

This is inconsistent with pvcalls_back_event() tests, where you check 
both for NULL. OTOH, I am not sure a check is needed at all since you've 
just tested these in pvcalls_back_event().


-boris

> +
> +	if (req->u.socket.domain != AF_INET ||
> +	    req->u.socket.type != SOCK_STREAM ||
> +	    (req->u.socket.protocol != 0 &&
> +	     req->u.socket.protocol != AF_INET))
> +		ret = -EAFNOSUPPORT;
> +	else
> +		ret = 0;
> +
> +	/* leave the actual socket allocation for later */
> +
> +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> +	rsp->req_id = req->req_id;
> +	rsp->cmd = req->cmd;
> +	rsp->u.socket.id = req->u.socket.id;
> +	rsp->ret = ret;
> +
> +	return 1;
>  }
>
>  static int pvcalls_back_connect(struct xenbus_device *dev,
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 07/18] xen/pvcalls: implement socket command
  2017-05-15 20:35     ` Stefano Stabellini
  (?)
  (?)
@ 2017-05-16  2:12     ` Boris Ostrovsky
  -1 siblings, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  2:12 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel



On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> Just reply with success to the other end for now. Delay the allocation
> of the actual socket to bind and/or connect.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 31 ++++++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2b2a49a..2eae096 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -12,12 +12,17 @@
>   * GNU General Public License for more details.
>   */
>
> +#include <linux/inet.h>
>  #include <linux/kthread.h>
>  #include <linux/list.h>
>  #include <linux/radix-tree.h>
>  #include <linux/module.h>
>  #include <linux/rwsem.h>
>  #include <linux/wait.h>
> +#include <net/sock.h>
> +#include <net/inet_common.h>
> +#include <net/inet_connection_sock.h>
> +#include <net/request_sock.h>
>
>  #include <xen/events.h>
>  #include <xen/grant_table.h>
> @@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct *work)
>  static int pvcalls_back_socket(struct xenbus_device *dev,
>  		struct xen_pvcalls_request *req)
>  {
> -	return 0;
> +	struct pvcalls_back_priv *priv;
> +	int ret;
> +	struct xen_pvcalls_response *rsp;
> +
> +	if (dev == NULL)
> +		return 0;
> +	priv = dev_get_drvdata(&dev->dev);

This is inconsistent with pvcalls_back_event() tests, where you check 
both for NULL. OTOH, I am not sure a check is needed at all since you've 
just tested these in pvcalls_back_event().


-boris

> +
> +	if (req->u.socket.domain != AF_INET ||
> +	    req->u.socket.type != SOCK_STREAM ||
> +	    (req->u.socket.protocol != 0 &&
> +	     req->u.socket.protocol != AF_INET))
> +		ret = -EAFNOSUPPORT;
> +	else
> +		ret = 0;
> +
> +	/* leave the actual socket allocation for later */
> +
> +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> +	rsp->req_id = req->req_id;
> +	rsp->cmd = req->cmd;
> +	rsp->u.socket.id = req->u.socket.id;
> +	rsp->ret = ret;
> +
> +	return 1;
>  }
>
>  static int pvcalls_back_connect(struct xenbus_device *dev,
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-15 20:36   ` [PATCH 08/18] xen/pvcalls: implement connect command Stefano Stabellini
@ 2017-05-16  2:36     ` Boris Ostrovsky
  2017-05-16 21:02       ` Stefano Stabellini
  2017-05-16 21:02       ` Stefano Stabellini
  2017-05-16  2:36     ` Boris Ostrovsky
  1 sibling, 2 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  2:36 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini



On 05/15/2017 04:36 PM, Stefano Stabellini wrote:
> Allocate a socket. Keep track of socket <-> ring mappings with a new data
> structure, called sock_mapping. Implement the connect command by calling
> inet_stream_connect, and mapping the new indexes page and data ring.
> Associate the socket to an ioworker randomly.
>
> When an active socket is closed (sk_state_change), set in_error to
> -ENOTCONN and notify the other end, as specified by the protocol.
>
> sk_data_ready will be implemented later.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 145 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 145 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2eae096..9ac1cf2 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -63,6 +63,29 @@ struct pvcalls_back_priv {
>  	struct work_struct register_work;
>  };
>
> +struct sock_mapping {
> +	struct list_head list;
> +	struct list_head queue;

Since you have two lists it would be helpful if names were a bit more 
descriptive.

(and comments for at least some fields would be welcome too)

> +	struct pvcalls_back_priv *priv;
> +	struct socket *sock;
> +	int data_worker;
> +	uint64_t id;
> +	grant_ref_t ref;
> +	struct pvcalls_data_intf *ring;
> +	void *bytes;
> +	struct pvcalls_data data;
> +	uint32_t ring_order;
> +	int irq;
> +	atomic_t read;
> +	atomic_t write;
> +	atomic_t release;
> +	void (*saved_data_ready)(struct sock *sk);
> +};
> +
> +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
> +static int pvcalls_back_release_active(struct xenbus_device *dev,
> +				       struct pvcalls_back_priv *priv,
> +				       struct sock_mapping *map);
>  static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
> @@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
>  	return 1;
>  }
>
> +static void pvcalls_sk_state_change(struct sock *sock)
> +{
> +	struct sock_mapping *map = sock->sk_user_data;
> +	struct pvcalls_data_intf *intf;
> +
> +	if (map == NULL)
> +		return;
> +
> +	intf = map->ring;
> +	intf->in_error = -ENOTCONN;
> +	notify_remote_via_irq(map->irq);
> +}
> +
> +static void pvcalls_sk_data_ready(struct sock *sock)
> +{
> +}
> +
>  static int pvcalls_back_connect(struct xenbus_device *dev,
>  				struct xen_pvcalls_request *req)
>  {
> +	struct pvcalls_back_priv *priv;
> +	int ret;
> +	struct socket *sock;
> +	struct sock_mapping *map = NULL;
> +	void *page;
> +	struct xen_pvcalls_response *rsp;
> +
> +	if (dev == NULL)
> +		return 0;
> +	priv = dev_get_drvdata(&dev->dev);
> +
> +	map = kzalloc(sizeof(*map), GFP_KERNEL);
> +	if (map == NULL) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
> +	if (ret < 0) {
> +		kfree(map);
> +		goto out;
> +	}
> +	INIT_LIST_HEAD(&map->queue);
> +	map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
> +
> +	map->priv = priv;
> +	map->sock = sock;
> +	map->id = req->u.connect.id;
> +	map->ref = req->u.connect.ref;
> +
> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
> +	if (ret < 0) {
> +		sock_release(map->sock);
> +		kfree(map);
> +		goto out;
> +	}
> +	map->ring = page;
> +	map->ring_order = map->ring->ring_order;
> +	/* first read the order, then map the data ring */
> +	virt_rmb();


Not sure I understand what the barrier is for here. I don't think 
compiler will reorder ring_order access with the call.


> +	if (map->ring_order > MAX_RING_ORDER) {
> +		ret = -EFAULT;
> +		goto out;
> +	}

If the barrier is indeed needed this check belongs before it.

-boris


> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
> +				     (1 << map->ring_order), &page);
> +	if (ret < 0) {
> +		sock_release(map->sock);
> +		xenbus_unmap_ring_vfree(dev, map->ring);
> +		kfree(map);
> +		goto out;
> +	}
> +	map->bytes = page;
> +
> +	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
> +						    req->u.connect.evtchn,
> +						    pvcalls_back_conn_event,
> +						    0,
> +						    "pvcalls-backend",
> +						    map);
> +	if (ret < 0) {
> +		sock_release(map->sock);
> +		kfree(map);
> +		goto out;
> +	}
> +	map->irq = ret;
> +
> +	map->data.in = map->bytes;
> +	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
> +
> +	down_write(&priv->pvcallss_lock);
> +	list_add_tail(&map->list, &priv->socket_mappings);
> +	up_write(&priv->pvcallss_lock);
> +
> +	ret = inet_stream_connect(sock, (struct sockaddr *)&req->u.connect.addr,
> +				  req->u.connect.len, req->u.connect.flags);
> +	if (ret < 0) {
> +		pvcalls_back_release_active(dev, priv, map);
> +	} else {
> +		lock_sock(sock->sk);
> +		map->saved_data_ready = sock->sk->sk_data_ready;
> +		sock->sk->sk_user_data = map;
> +		sock->sk->sk_data_ready = pvcalls_sk_data_ready;
> +		sock->sk->sk_state_change = pvcalls_sk_state_change;
> +		release_sock(sock->sk);
> +	}
> +
> +out:
> +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> +	rsp->req_id = req->req_id;
> +	rsp->cmd = req->cmd;
> +	rsp->u.connect.id = req->u.connect.id;
> +	rsp->ret = ret;
> +
> +	return 1;
> +}
> +
> +static int pvcalls_back_release_active(struct xenbus_device *dev,
> +				       struct pvcalls_back_priv *priv,
> +				       struct sock_mapping *map)
> +{
>  	return 0;
>  }
>
> @@ -215,6 +355,11 @@ static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
>  	return IRQ_HANDLED;
>  }
>
> +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
> +{
> +	return IRQ_HANDLED;
> +}
> +
>  static int backend_connect(struct xenbus_device *dev)
>  {
>  	int err, evtchn;
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-15 20:36   ` [PATCH 08/18] xen/pvcalls: implement connect command Stefano Stabellini
  2017-05-16  2:36     ` Boris Ostrovsky
@ 2017-05-16  2:36     ` Boris Ostrovsky
  1 sibling, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16  2:36 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel



On 05/15/2017 04:36 PM, Stefano Stabellini wrote:
> Allocate a socket. Keep track of socket <-> ring mappings with a new data
> structure, called sock_mapping. Implement the connect command by calling
> inet_stream_connect, and mapping the new indexes page and data ring.
> Associate the socket to an ioworker randomly.
>
> When an active socket is closed (sk_state_change), set in_error to
> -ENOTCONN and notify the other end, as specified by the protocol.
>
> sk_data_ready will be implemented later.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 145 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 145 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2eae096..9ac1cf2 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -63,6 +63,29 @@ struct pvcalls_back_priv {
>  	struct work_struct register_work;
>  };
>
> +struct sock_mapping {
> +	struct list_head list;
> +	struct list_head queue;

Since you have two lists it would be helpful if names were a bit more 
descriptive.

(and comments for at least some fields would be welcome too)

> +	struct pvcalls_back_priv *priv;
> +	struct socket *sock;
> +	int data_worker;
> +	uint64_t id;
> +	grant_ref_t ref;
> +	struct pvcalls_data_intf *ring;
> +	void *bytes;
> +	struct pvcalls_data data;
> +	uint32_t ring_order;
> +	int irq;
> +	atomic_t read;
> +	atomic_t write;
> +	atomic_t release;
> +	void (*saved_data_ready)(struct sock *sk);
> +};
> +
> +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
> +static int pvcalls_back_release_active(struct xenbus_device *dev,
> +				       struct pvcalls_back_priv *priv,
> +				       struct sock_mapping *map);
>  static void pvcalls_back_ioworker(struct work_struct *work)
>  {
>  }
> @@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
>  	return 1;
>  }
>
> +static void pvcalls_sk_state_change(struct sock *sock)
> +{
> +	struct sock_mapping *map = sock->sk_user_data;
> +	struct pvcalls_data_intf *intf;
> +
> +	if (map == NULL)
> +		return;
> +
> +	intf = map->ring;
> +	intf->in_error = -ENOTCONN;
> +	notify_remote_via_irq(map->irq);
> +}
> +
> +static void pvcalls_sk_data_ready(struct sock *sock)
> +{
> +}
> +
>  static int pvcalls_back_connect(struct xenbus_device *dev,
>  				struct xen_pvcalls_request *req)
>  {
> +	struct pvcalls_back_priv *priv;
> +	int ret;
> +	struct socket *sock;
> +	struct sock_mapping *map = NULL;
> +	void *page;
> +	struct xen_pvcalls_response *rsp;
> +
> +	if (dev == NULL)
> +		return 0;
> +	priv = dev_get_drvdata(&dev->dev);
> +
> +	map = kzalloc(sizeof(*map), GFP_KERNEL);
> +	if (map == NULL) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
> +	if (ret < 0) {
> +		kfree(map);
> +		goto out;
> +	}
> +	INIT_LIST_HEAD(&map->queue);
> +	map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
> +
> +	map->priv = priv;
> +	map->sock = sock;
> +	map->id = req->u.connect.id;
> +	map->ref = req->u.connect.ref;
> +
> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
> +	if (ret < 0) {
> +		sock_release(map->sock);
> +		kfree(map);
> +		goto out;
> +	}
> +	map->ring = page;
> +	map->ring_order = map->ring->ring_order;
> +	/* first read the order, then map the data ring */
> +	virt_rmb();


Not sure I understand what the barrier is for here. I don't think 
compiler will reorder ring_order access with the call.


> +	if (map->ring_order > MAX_RING_ORDER) {
> +		ret = -EFAULT;
> +		goto out;
> +	}

If the barrier is indeed needed this check belongs before it.

-boris


> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
> +				     (1 << map->ring_order), &page);
> +	if (ret < 0) {
> +		sock_release(map->sock);
> +		xenbus_unmap_ring_vfree(dev, map->ring);
> +		kfree(map);
> +		goto out;
> +	}
> +	map->bytes = page;
> +
> +	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
> +						    req->u.connect.evtchn,
> +						    pvcalls_back_conn_event,
> +						    0,
> +						    "pvcalls-backend",
> +						    map);
> +	if (ret < 0) {
> +		sock_release(map->sock);
> +		kfree(map);
> +		goto out;
> +	}
> +	map->irq = ret;
> +
> +	map->data.in = map->bytes;
> +	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
> +
> +	down_write(&priv->pvcallss_lock);
> +	list_add_tail(&map->list, &priv->socket_mappings);
> +	up_write(&priv->pvcallss_lock);
> +
> +	ret = inet_stream_connect(sock, (struct sockaddr *)&req->u.connect.addr,
> +				  req->u.connect.len, req->u.connect.flags);
> +	if (ret < 0) {
> +		pvcalls_back_release_active(dev, priv, map);
> +	} else {
> +		lock_sock(sock->sk);
> +		map->saved_data_ready = sock->sk->sk_data_ready;
> +		sock->sk->sk_user_data = map;
> +		sock->sk->sk_data_ready = pvcalls_sk_data_ready;
> +		sock->sk->sk_state_change = pvcalls_sk_state_change;
> +		release_sock(sock->sk);
> +	}
> +
> +out:
> +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> +	rsp->req_id = req->req_id;
> +	rsp->cmd = req->cmd;
> +	rsp->u.connect.id = req->u.connect.id;
> +	rsp->ret = ret;
> +
> +	return 1;
> +}
> +
> +static int pvcalls_back_release_active(struct xenbus_device *dev,
> +				       struct pvcalls_back_priv *priv,
> +				       struct sock_mapping *map)
> +{
>  	return 0;
>  }
>
> @@ -215,6 +355,11 @@ static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
>  	return IRQ_HANDLED;
>  }
>
> +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
> +{
> +	return IRQ_HANDLED;
> +}
> +
>  static int backend_connect(struct xenbus_device *dev)
>  {
>  	int err, evtchn;
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-15 20:35     ` Stefano Stabellini
                       ` (2 preceding siblings ...)
  (?)
@ 2017-05-16  6:40     ` Juergen Gross
  2017-05-16 19:58       ` Stefano Stabellini
  2017-05-16 19:58       ` Stefano Stabellini
  -1 siblings, 2 replies; 81+ messages in thread
From: Juergen Gross @ 2017-05-16  6:40 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel
  Cc: linux-kernel, boris.ostrovsky, Stefano Stabellini

On 15/05/17 22:35, Stefano Stabellini wrote:
> The pvcalls backend has one ioworker per cpu: the ioworkers are
> implemented as a cpu bound workqueue, and will deal with the actual
> socket and data ring reads/writes.
> 
> ioworkers are global: we only have one set for all the frontends. They
> process requests on their wqs list in order, once they are done with a
> request, they'll remove it from the list. A spinlock is used for
> protecting the list. Each ioworker is bound to a different cpu to
> maximize throughput.
> 
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2dbf7d8..46a889a 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -25,6 +25,26 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>  
> +struct pvcalls_ioworker {
> +	struct work_struct register_work;
> +	atomic_t io;
> +	struct list_head wqs;
> +	spinlock_t lock;
> +	int num;
> +};
> +
> +struct pvcalls_back_global {
> +	struct pvcalls_ioworker *ioworkers;
> +	int nr_ioworkers;
> +	struct workqueue_struct *wq;
> +	struct list_head privs;
> +	struct rw_semaphore privs_lock;
> +} pvcalls_back_global;
> +
> +static void pvcalls_back_ioworker(struct work_struct *work)
> +{
> +}
> +
>  static int pvcalls_back_probe(struct xenbus_device *dev,
>  			      const struct xenbus_device_id *id)
>  {
> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>  	.uevent = pvcalls_back_uevent,
>  	.otherend_changed = pvcalls_back_changed,
>  };
> +
> +static int __init pvcalls_back_init(void)
> +{
> +	int ret, i, cpu;
> +
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> +	if (ret < 0)
> +		return ret;
> +
> +	init_rwsem(&pvcalls_back_global.privs_lock);
> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> +	if (!pvcalls_back_global.wq)
> +		goto error;
> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();

Really? Recently I cam across a system with 640 dom0 cpus. I don't think
we want 640 workers initialized when loading the backend module. I'd
prefer one or a few workers per connected frontend.

> +	pvcalls_back_global.ioworkers = kzalloc(
> +		sizeof(*pvcalls_back_global.ioworkers) *
> +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);

kcalloc()?

> +	if (!pvcalls_back_global.ioworkers)
> +		goto error;
> +	i = 0;
> +	for_each_online_cpu(cpu) {
> +		pvcalls_back_global.ioworkers[i].num = i;
> +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> +			pvcalls_back_ioworker);
> +		i++;
> +	}
> +	return 0;
> +
> +error:
> +	if (pvcalls_back_global.wq)
> +		destroy_workqueue(pvcalls_back_global.wq);
> +	xenbus_unregister_driver(&pvcalls_back_driver);
> +	kfree(pvcalls_back_global.ioworkers);
> +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> +	return -ENOMEM;
> +}
> +module_init(pvcalls_back_init);
> 

Juergen

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-15 20:35     ` Stefano Stabellini
                       ` (3 preceding siblings ...)
  (?)
@ 2017-05-16  6:40     ` Juergen Gross
  -1 siblings, 0 replies; 81+ messages in thread
From: Juergen Gross @ 2017-05-16  6:40 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel
  Cc: Stefano Stabellini, boris.ostrovsky, linux-kernel

On 15/05/17 22:35, Stefano Stabellini wrote:
> The pvcalls backend has one ioworker per cpu: the ioworkers are
> implemented as a cpu bound workqueue, and will deal with the actual
> socket and data ring reads/writes.
> 
> ioworkers are global: we only have one set for all the frontends. They
> process requests on their wqs list in order, once they are done with a
> request, they'll remove it from the list. A spinlock is used for
> protecting the list. Each ioworker is bound to a different cpu to
> maximize throughput.
> 
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2dbf7d8..46a889a 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -25,6 +25,26 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>  
> +struct pvcalls_ioworker {
> +	struct work_struct register_work;
> +	atomic_t io;
> +	struct list_head wqs;
> +	spinlock_t lock;
> +	int num;
> +};
> +
> +struct pvcalls_back_global {
> +	struct pvcalls_ioworker *ioworkers;
> +	int nr_ioworkers;
> +	struct workqueue_struct *wq;
> +	struct list_head privs;
> +	struct rw_semaphore privs_lock;
> +} pvcalls_back_global;
> +
> +static void pvcalls_back_ioworker(struct work_struct *work)
> +{
> +}
> +
>  static int pvcalls_back_probe(struct xenbus_device *dev,
>  			      const struct xenbus_device_id *id)
>  {
> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>  	.uevent = pvcalls_back_uevent,
>  	.otherend_changed = pvcalls_back_changed,
>  };
> +
> +static int __init pvcalls_back_init(void)
> +{
> +	int ret, i, cpu;
> +
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> +	if (ret < 0)
> +		return ret;
> +
> +	init_rwsem(&pvcalls_back_global.privs_lock);
> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> +	if (!pvcalls_back_global.wq)
> +		goto error;
> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();

Really? Recently I cam across a system with 640 dom0 cpus. I don't think
we want 640 workers initialized when loading the backend module. I'd
prefer one or a few workers per connected frontend.

> +	pvcalls_back_global.ioworkers = kzalloc(
> +		sizeof(*pvcalls_back_global.ioworkers) *
> +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);

kcalloc()?

> +	if (!pvcalls_back_global.ioworkers)
> +		goto error;
> +	i = 0;
> +	for_each_online_cpu(cpu) {
> +		pvcalls_back_global.ioworkers[i].num = i;
> +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> +			pvcalls_back_ioworker);
> +		i++;
> +	}
> +	return 0;
> +
> +error:
> +	if (pvcalls_back_global.wq)
> +		destroy_workqueue(pvcalls_back_global.wq);
> +	xenbus_unregister_driver(&pvcalls_back_driver);
> +	kfree(pvcalls_back_global.ioworkers);
> +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> +	return -ENOMEM;
> +}
> +module_init(pvcalls_back_init);
> 

Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16  6:40     ` Juergen Gross
  2017-05-16 19:58       ` Stefano Stabellini
@ 2017-05-16 19:58       ` Stefano Stabellini
  2017-05-17  5:21         ` Juergen Gross
  2017-05-17  5:21         ` Juergen Gross
  1 sibling, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 19:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	Stefano Stabellini

On Tue, 16 May 2017, Juergen Gross wrote:
> On 15/05/17 22:35, Stefano Stabellini wrote:
> > The pvcalls backend has one ioworker per cpu: the ioworkers are
> > implemented as a cpu bound workqueue, and will deal with the actual
> > socket and data ring reads/writes.
> > 
> > ioworkers are global: we only have one set for all the frontends. They
> > process requests on their wqs list in order, once they are done with a
> > request, they'll remove it from the list. A spinlock is used for
> > protecting the list. Each ioworker is bound to a different cpu to
> > maximize throughput.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 64 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2dbf7d8..46a889a 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -25,6 +25,26 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> >  
> > +struct pvcalls_ioworker {
> > +	struct work_struct register_work;
> > +	atomic_t io;
> > +	struct list_head wqs;
> > +	spinlock_t lock;
> > +	int num;
> > +};
> > +
> > +struct pvcalls_back_global {
> > +	struct pvcalls_ioworker *ioworkers;
> > +	int nr_ioworkers;
> > +	struct workqueue_struct *wq;
> > +	struct list_head privs;
> > +	struct rw_semaphore privs_lock;
> > +} pvcalls_back_global;
> > +
> > +static void pvcalls_back_ioworker(struct work_struct *work)
> > +{
> > +}
> > +
> >  static int pvcalls_back_probe(struct xenbus_device *dev,
> >  			      const struct xenbus_device_id *id)
> >  {
> > @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
> >  	.uevent = pvcalls_back_uevent,
> >  	.otherend_changed = pvcalls_back_changed,
> >  };
> > +
> > +static int __init pvcalls_back_init(void)
> > +{
> > +	int ret, i, cpu;
> > +
> > +	if (!xen_domain())
> > +		return -ENODEV;
> > +
> > +	ret = xenbus_register_backend(&pvcalls_back_driver);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	init_rwsem(&pvcalls_back_global.privs_lock);
> > +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> > +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> > +	if (!pvcalls_back_global.wq)
> > +		goto error;
> > +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> 
> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
> we want 640 workers initialized when loading the backend module. I'd
> prefer one or a few workers per connected frontend.

I think we want to keep the ioworker allocation to be based on the
number of vcpus: we do not want more ioworkers than vcpus because it is
a waste of resources and leads to worse performance.  Also, given that
they do memcpy's, I also think it is a good idea to bind them to vcpus
(and pin vcpus to pcpus) to get best performance.

However, you have a point there: we need to handle systems with an
extremely large number of Dom0 vcpus. I suggest we introduce an
upper limit for the number of ioworkers. Something like:

#define MAX_IOWORKERS 64
nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())

MAX_IOWORKERS could be configurable via a command line option.


> > +	pvcalls_back_global.ioworkers = kzalloc(
> > +		sizeof(*pvcalls_back_global.ioworkers) *
> > +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
> 
> kcalloc()?

I'll make the change


> > +	if (!pvcalls_back_global.ioworkers)
> > +		goto error;
> > +	i = 0;
> > +	for_each_online_cpu(cpu) {
> > +		pvcalls_back_global.ioworkers[i].num = i;
> > +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> > +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> > +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> > +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> > +			pvcalls_back_ioworker);
> > +		i++;
> > +	}
> > +	return 0;
> > +
> > +error:
> > +	if (pvcalls_back_global.wq)
> > +		destroy_workqueue(pvcalls_back_global.wq);
> > +	xenbus_unregister_driver(&pvcalls_back_driver);
> > +	kfree(pvcalls_back_global.ioworkers);
> > +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> > +	return -ENOMEM;
> > +}
> > +module_init(pvcalls_back_init);
> > 
> 
> Juergen
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16  6:40     ` Juergen Gross
@ 2017-05-16 19:58       ` Stefano Stabellini
  2017-05-16 19:58       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 19:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, boris.ostrovsky, Stefano Stabellini,
	linux-kernel, xen-devel

On Tue, 16 May 2017, Juergen Gross wrote:
> On 15/05/17 22:35, Stefano Stabellini wrote:
> > The pvcalls backend has one ioworker per cpu: the ioworkers are
> > implemented as a cpu bound workqueue, and will deal with the actual
> > socket and data ring reads/writes.
> > 
> > ioworkers are global: we only have one set for all the frontends. They
> > process requests on their wqs list in order, once they are done with a
> > request, they'll remove it from the list. A spinlock is used for
> > protecting the list. Each ioworker is bound to a different cpu to
> > maximize throughput.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 64 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2dbf7d8..46a889a 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -25,6 +25,26 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> >  
> > +struct pvcalls_ioworker {
> > +	struct work_struct register_work;
> > +	atomic_t io;
> > +	struct list_head wqs;
> > +	spinlock_t lock;
> > +	int num;
> > +};
> > +
> > +struct pvcalls_back_global {
> > +	struct pvcalls_ioworker *ioworkers;
> > +	int nr_ioworkers;
> > +	struct workqueue_struct *wq;
> > +	struct list_head privs;
> > +	struct rw_semaphore privs_lock;
> > +} pvcalls_back_global;
> > +
> > +static void pvcalls_back_ioworker(struct work_struct *work)
> > +{
> > +}
> > +
> >  static int pvcalls_back_probe(struct xenbus_device *dev,
> >  			      const struct xenbus_device_id *id)
> >  {
> > @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
> >  	.uevent = pvcalls_back_uevent,
> >  	.otherend_changed = pvcalls_back_changed,
> >  };
> > +
> > +static int __init pvcalls_back_init(void)
> > +{
> > +	int ret, i, cpu;
> > +
> > +	if (!xen_domain())
> > +		return -ENODEV;
> > +
> > +	ret = xenbus_register_backend(&pvcalls_back_driver);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	init_rwsem(&pvcalls_back_global.privs_lock);
> > +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> > +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> > +	if (!pvcalls_back_global.wq)
> > +		goto error;
> > +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> 
> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
> we want 640 workers initialized when loading the backend module. I'd
> prefer one or a few workers per connected frontend.

I think we want to keep the ioworker allocation to be based on the
number of vcpus: we do not want more ioworkers than vcpus because it is
a waste of resources and leads to worse performance.  Also, given that
they do memcpy's, I also think it is a good idea to bind them to vcpus
(and pin vcpus to pcpus) to get best performance.

However, you have a point there: we need to handle systems with an
extremely large number of Dom0 vcpus. I suggest we introduce an
upper limit for the number of ioworkers. Something like:

#define MAX_IOWORKERS 64
nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())

MAX_IOWORKERS could be configurable via a command line option.


> > +	pvcalls_back_global.ioworkers = kzalloc(
> > +		sizeof(*pvcalls_back_global.ioworkers) *
> > +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
> 
> kcalloc()?

I'll make the change


> > +	if (!pvcalls_back_global.ioworkers)
> > +		goto error;
> > +	i = 0;
> > +	for_each_online_cpu(cpu) {
> > +		pvcalls_back_global.ioworkers[i].num = i;
> > +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> > +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> > +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> > +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> > +			pvcalls_back_ioworker);
> > +		i++;
> > +	}
> > +	return 0;
> > +
> > +error:
> > +	if (pvcalls_back_global.wq)
> > +		destroy_workqueue(pvcalls_back_global.wq);
> > +	xenbus_unregister_driver(&pvcalls_back_driver);
> > +	kfree(pvcalls_back_global.ioworkers);
> > +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> > +	return -ENOMEM;
> > +}
> > +module_init(pvcalls_back_init);
> > 
> 
> Juergen
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16  1:28     ` Boris Ostrovsky
  2017-05-16 20:05       ` Stefano Stabellini
@ 2017-05-16 20:05       ` Stefano Stabellini
  2017-05-16 20:22         ` Stefano Stabellini
  2017-05-16 20:22         ` Stefano Stabellini
  1 sibling, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:05 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > The pvcalls backend has one ioworker per cpu: the ioworkers are
> > implemented as a cpu bound workqueue, and will deal with the actual
> > socket and data ring reads/writes.
> > 
> > ioworkers are global: we only have one set for all the frontends. They
> > process requests on their wqs list in order, once they are done with a
> > request, they'll remove it from the list. A spinlock is used for
> > protecting the list. Each ioworker is bound to a different cpu to
> > maximize throughput.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 64
> > ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 64 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2dbf7d8..46a889a 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -25,6 +25,26 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> > 
> > +struct pvcalls_ioworker {
> > +	struct work_struct register_work;
> > +	atomic_t io;
> > +	struct list_head wqs;
> > +	spinlock_t lock;
> > +	int num;
> > +};
> > +
> > +struct pvcalls_back_global {
> > +	struct pvcalls_ioworker *ioworkers;
> > +	int nr_ioworkers;
> > +	struct workqueue_struct *wq;
> > +	struct list_head privs;
> > +	struct rw_semaphore privs_lock;
> 
> Is there a reason why these are called "privs"?

I realize it is a silly name :-)
It is called "privs" because it is a list of "priv" where priv is the
private per frontend data structure. I could call it "frontends"?


> And why are you using a rw semaphore --- I only noticed two instances of use
> and both are writes.

Yes, this is wrong, legacy from a previous version of the codebase. A
simple spin_lock should suffice for this use-case.


> > +} pvcalls_back_global;
> > +
> > +static void pvcalls_back_ioworker(struct work_struct *work)
> > +{
> > +}
> > +
> >  static int pvcalls_back_probe(struct xenbus_device *dev,
> >  			      const struct xenbus_device_id *id)
> >  {
> > @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device
> > *xdev,
> >  	.uevent = pvcalls_back_uevent,
> >  	.otherend_changed = pvcalls_back_changed,
> >  };
> > +
> > +static int __init pvcalls_back_init(void)
> > +{
> > +	int ret, i, cpu;
> > +
> > +	if (!xen_domain())
> > +		return -ENODEV;
> > +
> > +	ret = xenbus_register_backend(&pvcalls_back_driver);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	init_rwsem(&pvcalls_back_global.privs_lock);
> > +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> > +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> > +	if (!pvcalls_back_global.wq)
> > +		goto error;
> > +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> 
> 
> Should nr_ioworkers be updated on CPU hot(un)plug?

I thought about it, but I don't think it is worth introducing the
complexity to deal with dynamic ioworkers allocations.


 
> > +	pvcalls_back_global.ioworkers = kzalloc(
> > +		sizeof(*pvcalls_back_global.ioworkers) *
> > +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
> > +	if (!pvcalls_back_global.ioworkers)
> > +		goto error;
> > +	i = 0;
> > +	for_each_online_cpu(cpu) {
> > +		pvcalls_back_global.ioworkers[i].num = i;
> > +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> > +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> > +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> > +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> > +			pvcalls_back_ioworker);
> > +		i++;
> > +	}
> > +	return 0;
> > +
> > +error:
> > +	if (pvcalls_back_global.wq)
> > +		destroy_workqueue(pvcalls_back_global.wq);
> > +	xenbus_unregister_driver(&pvcalls_back_driver);
> > +	kfree(pvcalls_back_global.ioworkers);
> > +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> > +	return -ENOMEM;
> 
> This routine could use more newlines. (and in other patches too)

I'll sprinkle some around


> > +}
> > +module_init(pvcalls_back_init);

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16  1:28     ` Boris Ostrovsky
@ 2017-05-16 20:05       ` Stefano Stabellini
  2017-05-16 20:05       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:05 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > The pvcalls backend has one ioworker per cpu: the ioworkers are
> > implemented as a cpu bound workqueue, and will deal with the actual
> > socket and data ring reads/writes.
> > 
> > ioworkers are global: we only have one set for all the frontends. They
> > process requests on their wqs list in order, once they are done with a
> > request, they'll remove it from the list. A spinlock is used for
> > protecting the list. Each ioworker is bound to a different cpu to
> > maximize throughput.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 64
> > ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 64 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2dbf7d8..46a889a 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -25,6 +25,26 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> > 
> > +struct pvcalls_ioworker {
> > +	struct work_struct register_work;
> > +	atomic_t io;
> > +	struct list_head wqs;
> > +	spinlock_t lock;
> > +	int num;
> > +};
> > +
> > +struct pvcalls_back_global {
> > +	struct pvcalls_ioworker *ioworkers;
> > +	int nr_ioworkers;
> > +	struct workqueue_struct *wq;
> > +	struct list_head privs;
> > +	struct rw_semaphore privs_lock;
> 
> Is there a reason why these are called "privs"?

I realize it is a silly name :-)
It is called "privs" because it is a list of "priv" where priv is the
private per frontend data structure. I could call it "frontends"?


> And why are you using a rw semaphore --- I only noticed two instances of use
> and both are writes.

Yes, this is wrong, legacy from a previous version of the codebase. A
simple spin_lock should suffice for this use-case.


> > +} pvcalls_back_global;
> > +
> > +static void pvcalls_back_ioworker(struct work_struct *work)
> > +{
> > +}
> > +
> >  static int pvcalls_back_probe(struct xenbus_device *dev,
> >  			      const struct xenbus_device_id *id)
> >  {
> > @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device
> > *xdev,
> >  	.uevent = pvcalls_back_uevent,
> >  	.otherend_changed = pvcalls_back_changed,
> >  };
> > +
> > +static int __init pvcalls_back_init(void)
> > +{
> > +	int ret, i, cpu;
> > +
> > +	if (!xen_domain())
> > +		return -ENODEV;
> > +
> > +	ret = xenbus_register_backend(&pvcalls_back_driver);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	init_rwsem(&pvcalls_back_global.privs_lock);
> > +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> > +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> > +	if (!pvcalls_back_global.wq)
> > +		goto error;
> > +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> 
> 
> Should nr_ioworkers be updated on CPU hot(un)plug?

I thought about it, but I don't think it is worth introducing the
complexity to deal with dynamic ioworkers allocations.


 
> > +	pvcalls_back_global.ioworkers = kzalloc(
> > +		sizeof(*pvcalls_back_global.ioworkers) *
> > +		pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
> > +	if (!pvcalls_back_global.ioworkers)
> > +		goto error;
> > +	i = 0;
> > +	for_each_online_cpu(cpu) {
> > +		pvcalls_back_global.ioworkers[i].num = i;
> > +		atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> > +		spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> > +		INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> > +		INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> > +			pvcalls_back_ioworker);
> > +		i++;
> > +	}
> > +	return 0;
> > +
> > +error:
> > +	if (pvcalls_back_global.wq)
> > +		destroy_workqueue(pvcalls_back_global.wq);
> > +	xenbus_unregister_driver(&pvcalls_back_driver);
> > +	kfree(pvcalls_back_global.ioworkers);
> > +	memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> > +	return -ENOMEM;
> 
> This routine could use more newlines. (and in other patches too)

I'll sprinkle some around


> > +}
> > +module_init(pvcalls_back_init);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/18] xen/pvcalls: xenbus state handling
  2017-05-16  1:34       ` Boris Ostrovsky
  (?)
  (?)
@ 2017-05-16 20:11       ` Stefano Stabellini
  -1 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:11 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > Introduce the code to handle xenbus state changes.
> > 
> > Implement the probe function for the pvcalls backend. Write the
> > supported versions, max-page-order and function-calls nodes to xenstore,
> > as required by the protocol.
> > 
> > Introduce stub functions for disconnecting/connecting to a frontend.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 133
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 133 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 46a889a..86eca19 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -25,6 +25,9 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> > 
> > +#define PVCALLS_VERSIONS "1"
> > +#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> > +
> >  struct pvcalls_ioworker {
> >  	struct work_struct register_work;
> >  	atomic_t io;
> > @@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct
> > *work)
> >  {
> >  }
> > 
> > +static int backend_connect(struct xenbus_device *dev)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int backend_disconnect(struct xenbus_device *dev)
> > +{
> > +	return 0;
> > +}
> > +
> >  static int pvcalls_back_probe(struct xenbus_device *dev,
> >  			      const struct xenbus_device_id *id)
> >  {
> > +	int err;
> > +
> > +	err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
> > +			    PVCALLS_VERSIONS);
> > +	if (err) {
> > +		pr_warn("%s write out 'version' failed\n", __func__);
> > +		return -EINVAL;
> 
> Why not return err? (below too)

Yeah, I'll make the change.


> > +	}
> > +
> > +	err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
> > +			    MAX_RING_ORDER);
> > +	if (err) {
> > +		pr_warn("%s write out 'max-page-order' failed\n", __func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* "1" means socket, connect, release, bind, listen, accept and poll*/
> > +	err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");
> 
> 
> Should "1" be defined in the (public) header file?

Fair enough, it makes sense.

 
> > +	if (err) {
> > +		pr_warn("%s write out 'function-calls' failed\n", __func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	err = xenbus_switch_state(dev, XenbusStateInitWait);
> > +	if (err)
> > +		return err;
> > +
> >  	return 0;
> >  }
> > 
> > +static void set_backend_state(struct xenbus_device *dev,
> > +			      enum xenbus_state state)
> > +{
> > +	while (dev->state != state) {
> > +		switch (dev->state) {
> > +		case XenbusStateClosed:
> > +			switch (state) {
> > +			case XenbusStateInitWait:
> > +			case XenbusStateConnected:
> > +				xenbus_switch_state(dev, XenbusStateInitWait);
> > +				break;
> > +			case XenbusStateClosing:
> > +				xenbus_switch_state(dev, XenbusStateClosing);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		case XenbusStateInitWait:
> > +		case XenbusStateInitialised:
> > +			switch (state) {
> > +			case XenbusStateConnected:
> > +				backend_connect(dev);
> > +				xenbus_switch_state(dev,
> > XenbusStateConnected);
> > +				break;
> > +			case XenbusStateClosing:
> > +			case XenbusStateClosed:
> > +				xenbus_switch_state(dev, XenbusStateClosing);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		case XenbusStateConnected:
> > +			switch (state) {
> > +			case XenbusStateInitWait:
> > +			case XenbusStateClosing:
> > +			case XenbusStateClosed:
> > +				down_write(&pvcalls_back_global.privs_lock);
> > +				backend_disconnect(dev);
> > +				up_write(&pvcalls_back_global.privs_lock);
> 
> 
> Unless you plan to have more stuff under the semaphore, I'd consider putting
> them in backend_disconnect().

Yes, there will be more things in pvcalls_back_fin (the function that
implements module_exit in patch #14).


> > +				xenbus_switch_state(dev, XenbusStateClosing);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		case XenbusStateClosing:
> > +			switch (state) {
> > +			case XenbusStateInitWait:
> > +			case XenbusStateConnected:
> > +			case XenbusStateClosed:
> > +				xenbus_switch_state(dev, XenbusStateClosed);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		default:
> > +			__WARN();
> > +		}
> > +	}
> > +}
> > +
> >  static void pvcalls_back_changed(struct xenbus_device *dev,
> >  				 enum xenbus_state frontend_state)
> >  {
> > +	switch (frontend_state) {
> > +	case XenbusStateInitialising:
> > +		set_backend_state(dev, XenbusStateInitWait);
> > +		break;
> > +
> > +	case XenbusStateInitialised:
> > +	case XenbusStateConnected:
> > +		set_backend_state(dev, XenbusStateConnected);
> > +		break;
> > +
> > +	case XenbusStateClosing:
> > +		set_backend_state(dev, XenbusStateClosing);
> > +		break;
> > +
> > +	case XenbusStateClosed:
> > +		set_backend_state(dev, XenbusStateClosed);
> > +		if (xenbus_dev_is_online(dev))
> > +			break;
> > +		/* fall through if not online */
> > +	case XenbusStateUnknown:
> > +		set_backend_state(dev, XenbusStateClosed);
> 
> 
> You are setting XenbusStateClosed twice in case of fallthrough.

I'll fix it.


> > +		device_unregister(&dev->dev);
> > +		break;
> > +
> > +	default:
> > +		xenbus_dev_fatal(dev, -EINVAL, "saw state %d at frontend",
> > +				 frontend_state);
> > +		break;
> > +	}
> >  }
> > 
> >  static int pvcalls_back_remove(struct xenbus_device *dev)
> > 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/18] xen/pvcalls: xenbus state handling
  2017-05-16  1:34       ` Boris Ostrovsky
  (?)
@ 2017-05-16 20:11       ` Stefano Stabellini
  -1 siblings, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:11 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > Introduce the code to handle xenbus state changes.
> > 
> > Implement the probe function for the pvcalls backend. Write the
> > supported versions, max-page-order and function-calls nodes to xenstore,
> > as required by the protocol.
> > 
> > Introduce stub functions for disconnecting/connecting to a frontend.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 133
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 133 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 46a889a..86eca19 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -25,6 +25,9 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> > 
> > +#define PVCALLS_VERSIONS "1"
> > +#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> > +
> >  struct pvcalls_ioworker {
> >  	struct work_struct register_work;
> >  	atomic_t io;
> > @@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct
> > *work)
> >  {
> >  }
> > 
> > +static int backend_connect(struct xenbus_device *dev)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int backend_disconnect(struct xenbus_device *dev)
> > +{
> > +	return 0;
> > +}
> > +
> >  static int pvcalls_back_probe(struct xenbus_device *dev,
> >  			      const struct xenbus_device_id *id)
> >  {
> > +	int err;
> > +
> > +	err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
> > +			    PVCALLS_VERSIONS);
> > +	if (err) {
> > +		pr_warn("%s write out 'version' failed\n", __func__);
> > +		return -EINVAL;
> 
> Why not return err? (below too)

Yeah, I'll make the change.


> > +	}
> > +
> > +	err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
> > +			    MAX_RING_ORDER);
> > +	if (err) {
> > +		pr_warn("%s write out 'max-page-order' failed\n", __func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* "1" means socket, connect, release, bind, listen, accept and poll*/
> > +	err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");
> 
> 
> Should "1" be defined in the (public) header file?

Fair enough, it makes sense.

 
> > +	if (err) {
> > +		pr_warn("%s write out 'function-calls' failed\n", __func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	err = xenbus_switch_state(dev, XenbusStateInitWait);
> > +	if (err)
> > +		return err;
> > +
> >  	return 0;
> >  }
> > 
> > +static void set_backend_state(struct xenbus_device *dev,
> > +			      enum xenbus_state state)
> > +{
> > +	while (dev->state != state) {
> > +		switch (dev->state) {
> > +		case XenbusStateClosed:
> > +			switch (state) {
> > +			case XenbusStateInitWait:
> > +			case XenbusStateConnected:
> > +				xenbus_switch_state(dev, XenbusStateInitWait);
> > +				break;
> > +			case XenbusStateClosing:
> > +				xenbus_switch_state(dev, XenbusStateClosing);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		case XenbusStateInitWait:
> > +		case XenbusStateInitialised:
> > +			switch (state) {
> > +			case XenbusStateConnected:
> > +				backend_connect(dev);
> > +				xenbus_switch_state(dev,
> > XenbusStateConnected);
> > +				break;
> > +			case XenbusStateClosing:
> > +			case XenbusStateClosed:
> > +				xenbus_switch_state(dev, XenbusStateClosing);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		case XenbusStateConnected:
> > +			switch (state) {
> > +			case XenbusStateInitWait:
> > +			case XenbusStateClosing:
> > +			case XenbusStateClosed:
> > +				down_write(&pvcalls_back_global.privs_lock);
> > +				backend_disconnect(dev);
> > +				up_write(&pvcalls_back_global.privs_lock);
> 
> 
> Unless you plan to have more stuff under the semaphore, I'd consider putting
> them in backend_disconnect().

Yes, there will be more things in pvcalls_back_fin (the function that
implements module_exit in patch #14).


> > +				xenbus_switch_state(dev, XenbusStateClosing);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		case XenbusStateClosing:
> > +			switch (state) {
> > +			case XenbusStateInitWait:
> > +			case XenbusStateConnected:
> > +			case XenbusStateClosed:
> > +				xenbus_switch_state(dev, XenbusStateClosed);
> > +				break;
> > +			default:
> > +				__WARN();
> > +			}
> > +			break;
> > +		default:
> > +			__WARN();
> > +		}
> > +	}
> > +}
> > +
> >  static void pvcalls_back_changed(struct xenbus_device *dev,
> >  				 enum xenbus_state frontend_state)
> >  {
> > +	switch (frontend_state) {
> > +	case XenbusStateInitialising:
> > +		set_backend_state(dev, XenbusStateInitWait);
> > +		break;
> > +
> > +	case XenbusStateInitialised:
> > +	case XenbusStateConnected:
> > +		set_backend_state(dev, XenbusStateConnected);
> > +		break;
> > +
> > +	case XenbusStateClosing:
> > +		set_backend_state(dev, XenbusStateClosing);
> > +		break;
> > +
> > +	case XenbusStateClosed:
> > +		set_backend_state(dev, XenbusStateClosed);
> > +		if (xenbus_dev_is_online(dev))
> > +			break;
> > +		/* fall through if not online */
> > +	case XenbusStateUnknown:
> > +		set_backend_state(dev, XenbusStateClosed);
> 
> 
> You are setting XenbusStateClosed twice in case of fallthrough.

I'll fix it.


> > +		device_unregister(&dev->dev);
> > +		break;
> > +
> > +	default:
> > +		xenbus_dev_fatal(dev, -EINVAL, "saw state %d at frontend",
> > +				 frontend_state);
> > +		break;
> > +	}
> >  }
> > 
> >  static int pvcalls_back_remove(struct xenbus_device *dev)
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16 20:05       ` Stefano Stabellini
  2017-05-16 20:22         ` Stefano Stabellini
@ 2017-05-16 20:22         ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:22 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Boris Ostrovsky, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 16 May 2017, Stefano Stabellini wrote:
> > And why are you using a rw semaphore --- I only noticed two instances of use
> > and both are writes.
> 
> Yes, this is wrong, legacy from a previous version of the codebase. A
> simple spin_lock should suffice for this use-case.

I replied too quickly: it is best as a semaphore because the functions
within the critical regions can cause a reschedule. But there is no need
to use a rw_semaphore, so I'll switch it to a regular semaphore.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16 20:05       ` Stefano Stabellini
@ 2017-05-16 20:22         ` Stefano Stabellini
  2017-05-16 20:22         ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:22 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: jgross, Stefano Stabellini, Boris Ostrovsky, linux-kernel, xen-devel

On Tue, 16 May 2017, Stefano Stabellini wrote:
> > And why are you using a rw semaphore --- I only noticed two instances of use
> > and both are writes.
> 
> Yes, this is wrong, legacy from a previous version of the codebase. A
> simple spin_lock should suffice for this use-case.

I replied too quickly: it is best as a semaphore because the functions
within the critical regions can cause a reschedule. But there is no need
to use a rw_semaphore, so I'll switch it to a regular semaphore.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/18] xen/pvcalls: connect to a frontend
  2017-05-16  1:52     ` Boris Ostrovsky
  2017-05-16 20:23       ` Stefano Stabellini
@ 2017-05-16 20:23       ` Stefano Stabellini
  2017-05-16 20:38         ` Stefano Stabellini
  2017-05-16 20:38         ` Stefano Stabellini
  1 sibling, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:23 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > Introduce a per-frontend data structure named pvcalls_back_priv. It
> > contains pointers to the command ring, its event channel, a list of
> > active sockets and a tree of passive sockets (passing sockets need to be
> > looked up from the id on listen, accept and poll commands, while active
> > sockets only on release).
> 
> It would be useful to put this into a comment in pvcalls_back_priv definition.

I'll do that.


> > It also has an unbound workqueue to schedule the work of parsing and
> > executing commands on the command ring. pvcallss_lock protects the two
> > lists. In pvcalls_back_global, keep a list of connected frontends.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 87
> > ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 87 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 86eca19..876e577 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -44,13 +44,100 @@ struct pvcalls_back_global {
> >  	struct rw_semaphore privs_lock;
> >  } pvcalls_back_global;
> > 
> > +struct pvcalls_back_priv {
> > +	struct list_head list;
> > +	struct xenbus_device *dev;
> > +	struct xen_pvcalls_sring *sring;
> > +	struct xen_pvcalls_back_ring ring;
> > +	int irq;
> > +	struct list_head socket_mappings;
> > +	struct radix_tree_root socketpass_mappings;
> > +	struct rw_semaphore pvcallss_lock;
> 
> Same question as before regarding using rw semaphore --- I only see
> down/up_writes.

And again, you are right. I'll switch it to a regular semaphore.


> And what does the name (pvcallss) stand for?
> 
> 
> > +	atomic_t work;
> > +	struct workqueue_struct *wq;
> > +	struct work_struct register_work;
> > +};
> > +
> >  static void pvcalls_back_ioworker(struct work_struct *work)
> >  {
> >  }
> > 
> > +static void pvcalls_back_work(struct work_struct *work)
> > +{
> > +}
> > +
> > +static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
> > +{
> > +	return IRQ_HANDLED;
> > +}
> > +
> >  static int backend_connect(struct xenbus_device *dev)
> >  {
> > +	int err, evtchn;
> > +	grant_ref_t ring_ref;
> > +	void *addr = NULL;
> > +	struct pvcalls_back_priv *priv = NULL;
> > +
> > +	priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
> > +	if (!priv)
> > +		return -ENOMEM;
> > +
> > +	err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
> > +			   &evtchn);
> > +	if (err != 1) {
> > +		err = -EINVAL;
> > +		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
> > +				 dev->otherend);
> > +		goto error;
> > +	}
> > +
> > +	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u",
> > &ring_ref);
> > +	if (err != 1) {
> > +		err = -EINVAL;
> > +		xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
> > +				 dev->otherend);
> > +		goto error;
> > +	}
> > +
> > +	err = xenbus_map_ring_valloc(dev, &ring_ref, 1, &addr);
> > +	if (err < 0)
> > +		goto error;
> 
> 
> I'd move this closer to first use, below.

Sure


> > +
> > +	err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
> > +						    pvcalls_back_event, 0,
> > +						    "pvcalls-backend", dev);
> > +	if (err < 0)
> > +		goto error;
> > +
> > +	priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
> > +	if (!priv->wq) {
> > +		err = -ENOMEM;
> > +		goto error;
> > +	}
> > +	INIT_WORK(&priv->register_work, pvcalls_back_work);
> > +	priv->dev = dev;
> > +	priv->sring = addr;
> > +	BACK_RING_INIT(&priv->ring, priv->sring, XEN_PAGE_SIZE * 1);
> > +	priv->irq = err;
> > +	INIT_LIST_HEAD(&priv->socket_mappings);
> > +	INIT_RADIX_TREE(&priv->socketpass_mappings, GFP_KERNEL);
> > +	init_rwsem(&priv->pvcallss_lock);
> > +	dev_set_drvdata(&dev->dev, priv);
> > +	down_write(&pvcalls_back_global.privs_lock);
> > +	list_add_tail(&priv->list, &pvcalls_back_global.privs);
> > +	up_write(&pvcalls_back_global.privs_lock);
> > +	queue_work(priv->wq, &priv->register_work);
> > +
> >  	return 0;
> > +
> > + error:
> > +	if (addr != NULL)
> > +		xenbus_unmap_ring_vfree(dev, addr);
> > +	if (priv->wq)
> > +		destroy_workqueue(priv->wq);
> > +	unbind_from_irqhandler(priv->irq, dev);
> > +	kfree(priv);
> > +	return err;
> >  }
> > 
> >  static int backend_disconnect(struct xenbus_device *dev)
> > 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/18] xen/pvcalls: connect to a frontend
  2017-05-16  1:52     ` Boris Ostrovsky
@ 2017-05-16 20:23       ` Stefano Stabellini
  2017-05-16 20:23       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:23 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > Introduce a per-frontend data structure named pvcalls_back_priv. It
> > contains pointers to the command ring, its event channel, a list of
> > active sockets and a tree of passive sockets (passing sockets need to be
> > looked up from the id on listen, accept and poll commands, while active
> > sockets only on release).
> 
> It would be useful to put this into a comment in pvcalls_back_priv definition.

I'll do that.


> > It also has an unbound workqueue to schedule the work of parsing and
> > executing commands on the command ring. pvcallss_lock protects the two
> > lists. In pvcalls_back_global, keep a list of connected frontends.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 87
> > ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 87 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 86eca19..876e577 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -44,13 +44,100 @@ struct pvcalls_back_global {
> >  	struct rw_semaphore privs_lock;
> >  } pvcalls_back_global;
> > 
> > +struct pvcalls_back_priv {
> > +	struct list_head list;
> > +	struct xenbus_device *dev;
> > +	struct xen_pvcalls_sring *sring;
> > +	struct xen_pvcalls_back_ring ring;
> > +	int irq;
> > +	struct list_head socket_mappings;
> > +	struct radix_tree_root socketpass_mappings;
> > +	struct rw_semaphore pvcallss_lock;
> 
> Same question as before regarding using rw semaphore --- I only see
> down/up_writes.

And again, you are right. I'll switch it to a regular semaphore.


> And what does the name (pvcallss) stand for?
> 
> 
> > +	atomic_t work;
> > +	struct workqueue_struct *wq;
> > +	struct work_struct register_work;
> > +};
> > +
> >  static void pvcalls_back_ioworker(struct work_struct *work)
> >  {
> >  }
> > 
> > +static void pvcalls_back_work(struct work_struct *work)
> > +{
> > +}
> > +
> > +static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
> > +{
> > +	return IRQ_HANDLED;
> > +}
> > +
> >  static int backend_connect(struct xenbus_device *dev)
> >  {
> > +	int err, evtchn;
> > +	grant_ref_t ring_ref;
> > +	void *addr = NULL;
> > +	struct pvcalls_back_priv *priv = NULL;
> > +
> > +	priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
> > +	if (!priv)
> > +		return -ENOMEM;
> > +
> > +	err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
> > +			   &evtchn);
> > +	if (err != 1) {
> > +		err = -EINVAL;
> > +		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
> > +				 dev->otherend);
> > +		goto error;
> > +	}
> > +
> > +	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u",
> > &ring_ref);
> > +	if (err != 1) {
> > +		err = -EINVAL;
> > +		xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
> > +				 dev->otherend);
> > +		goto error;
> > +	}
> > +
> > +	err = xenbus_map_ring_valloc(dev, &ring_ref, 1, &addr);
> > +	if (err < 0)
> > +		goto error;
> 
> 
> I'd move this closer to first use, below.

Sure


> > +
> > +	err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
> > +						    pvcalls_back_event, 0,
> > +						    "pvcalls-backend", dev);
> > +	if (err < 0)
> > +		goto error;
> > +
> > +	priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
> > +	if (!priv->wq) {
> > +		err = -ENOMEM;
> > +		goto error;
> > +	}
> > +	INIT_WORK(&priv->register_work, pvcalls_back_work);
> > +	priv->dev = dev;
> > +	priv->sring = addr;
> > +	BACK_RING_INIT(&priv->ring, priv->sring, XEN_PAGE_SIZE * 1);
> > +	priv->irq = err;
> > +	INIT_LIST_HEAD(&priv->socket_mappings);
> > +	INIT_RADIX_TREE(&priv->socketpass_mappings, GFP_KERNEL);
> > +	init_rwsem(&priv->pvcallss_lock);
> > +	dev_set_drvdata(&dev->dev, priv);
> > +	down_write(&pvcalls_back_global.privs_lock);
> > +	list_add_tail(&priv->list, &pvcalls_back_global.privs);
> > +	up_write(&pvcalls_back_global.privs_lock);
> > +	queue_work(priv->wq, &priv->register_work);
> > +
> >  	return 0;
> > +
> > + error:
> > +	if (addr != NULL)
> > +		xenbus_unmap_ring_vfree(dev, addr);
> > +	if (priv->wq)
> > +		destroy_workqueue(priv->wq);
> > +	unbind_from_irqhandler(priv->irq, dev);
> > +	kfree(priv);
> > +	return err;
> >  }
> > 
> >  static int backend_disconnect(struct xenbus_device *dev)
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/18] xen/pvcalls: connect to a frontend
  2017-05-16 20:23       ` Stefano Stabellini
@ 2017-05-16 20:38         ` Stefano Stabellini
  2017-05-16 20:38         ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Boris Ostrovsky, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 16 May 2017, Stefano Stabellini wrote:
> > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > > index 86eca19..876e577 100644
> > > --- a/drivers/xen/pvcalls-back.c
> > > +++ b/drivers/xen/pvcalls-back.c
> > > @@ -44,13 +44,100 @@ struct pvcalls_back_global {
> > >  	struct rw_semaphore privs_lock;
> > >  } pvcalls_back_global;
> > > 
> > > +struct pvcalls_back_priv {
> > > +	struct list_head list;
> > > +	struct xenbus_device *dev;
> > > +	struct xen_pvcalls_sring *sring;
> > > +	struct xen_pvcalls_back_ring ring;
> > > +	int irq;
> > > +	struct list_head socket_mappings;
> > > +	struct radix_tree_root socketpass_mappings;
> > > +	struct rw_semaphore pvcallss_lock;
> > 
> > Same question as before regarding using rw semaphore --- I only see
> > down/up_writes.
> 
> And again, you are right. I'll switch it to a regular semaphore.
> 
> 
> > And what does the name (pvcallss) stand for?

It stands for socket lock. I'll rename it to socket_lock.


> > 
> > > +	atomic_t work;
> > > +	struct workqueue_struct *wq;
> > > +	struct work_struct register_work;
> > > +};

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/18] xen/pvcalls: connect to a frontend
  2017-05-16 20:23       ` Stefano Stabellini
  2017-05-16 20:38         ` Stefano Stabellini
@ 2017-05-16 20:38         ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: jgross, Stefano Stabellini, Boris Ostrovsky, linux-kernel, xen-devel

On Tue, 16 May 2017, Stefano Stabellini wrote:
> > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > > index 86eca19..876e577 100644
> > > --- a/drivers/xen/pvcalls-back.c
> > > +++ b/drivers/xen/pvcalls-back.c
> > > @@ -44,13 +44,100 @@ struct pvcalls_back_global {
> > >  	struct rw_semaphore privs_lock;
> > >  } pvcalls_back_global;
> > > 
> > > +struct pvcalls_back_priv {
> > > +	struct list_head list;
> > > +	struct xenbus_device *dev;
> > > +	struct xen_pvcalls_sring *sring;
> > > +	struct xen_pvcalls_back_ring ring;
> > > +	int irq;
> > > +	struct list_head socket_mappings;
> > > +	struct radix_tree_root socketpass_mappings;
> > > +	struct rw_semaphore pvcallss_lock;
> > 
> > Same question as before regarding using rw semaphore --- I only see
> > down/up_writes.
> 
> And again, you are right. I'll switch it to a regular semaphore.
> 
> 
> > And what does the name (pvcallss) stand for?

It stands for socket lock. I'll rename it to socket_lock.


> > 
> > > +	atomic_t work;
> > > +	struct workqueue_struct *wq;
> > > +	struct work_struct register_work;
> > > +};

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 07/18] xen/pvcalls: implement socket command
  2017-05-16  2:12     ` Boris Ostrovsky
  2017-05-16 20:45       ` Stefano Stabellini
@ 2017-05-16 20:45       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:45 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > Just reply with success to the other end for now. Delay the allocation
> > of the actual socket to bind and/or connect.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 31 ++++++++++++++++++++++++++++++-
> >  1 file changed, 30 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2b2a49a..2eae096 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -12,12 +12,17 @@
> >   * GNU General Public License for more details.
> >   */
> > 
> > +#include <linux/inet.h>
> >  #include <linux/kthread.h>
> >  #include <linux/list.h>
> >  #include <linux/radix-tree.h>
> >  #include <linux/module.h>
> >  #include <linux/rwsem.h>
> >  #include <linux/wait.h>
> > +#include <net/sock.h>
> > +#include <net/inet_common.h>
> > +#include <net/inet_connection_sock.h>
> > +#include <net/request_sock.h>
> > 
> >  #include <xen/events.h>
> >  #include <xen/grant_table.h>
> > @@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct
> > *work)
> >  static int pvcalls_back_socket(struct xenbus_device *dev,
> >  		struct xen_pvcalls_request *req)
> >  {
> > -	return 0;
> > +	struct pvcalls_back_priv *priv;
> > +	int ret;
> > +	struct xen_pvcalls_response *rsp;
> > +
> > +	if (dev == NULL)
> > +		return 0;
> > +	priv = dev_get_drvdata(&dev->dev);
> 
> This is inconsistent with pvcalls_back_event() tests, where you check both for
> NULL. OTOH, I am not sure a check is needed at all since you've just tested
> these in pvcalls_back_event().

That's because priv cannot be NULL at this stage: it was allocated
before queuing any work to priv->wq. While in pvcalls_back_event I have
been more careful in case of spurious (erroneous) notifications.

I agree that I could remove this dev == NULL check here, and in the
other command handlers. I'll do that.


> > +
> > +	if (req->u.socket.domain != AF_INET ||
> > +	    req->u.socket.type != SOCK_STREAM ||
> > +	    (req->u.socket.protocol != 0 &&
> > +	     req->u.socket.protocol != AF_INET))
> > +		ret = -EAFNOSUPPORT;
> > +	else
> > +		ret = 0;
> > +
> > +	/* leave the actual socket allocation for later */
> > +
> > +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> > +	rsp->req_id = req->req_id;
> > +	rsp->cmd = req->cmd;
> > +	rsp->u.socket.id = req->u.socket.id;
> > +	rsp->ret = ret;
> > +
> > +	return 1;
> >  }
> > 
> >  static int pvcalls_back_connect(struct xenbus_device *dev,
> > 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 07/18] xen/pvcalls: implement socket command
  2017-05-16  2:12     ` Boris Ostrovsky
@ 2017-05-16 20:45       ` Stefano Stabellini
  2017-05-16 20:45       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:45 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > Just reply with success to the other end for now. Delay the allocation
> > of the actual socket to bind and/or connect.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 31 ++++++++++++++++++++++++++++++-
> >  1 file changed, 30 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2b2a49a..2eae096 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -12,12 +12,17 @@
> >   * GNU General Public License for more details.
> >   */
> > 
> > +#include <linux/inet.h>
> >  #include <linux/kthread.h>
> >  #include <linux/list.h>
> >  #include <linux/radix-tree.h>
> >  #include <linux/module.h>
> >  #include <linux/rwsem.h>
> >  #include <linux/wait.h>
> > +#include <net/sock.h>
> > +#include <net/inet_common.h>
> > +#include <net/inet_connection_sock.h>
> > +#include <net/request_sock.h>
> > 
> >  #include <xen/events.h>
> >  #include <xen/grant_table.h>
> > @@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct
> > *work)
> >  static int pvcalls_back_socket(struct xenbus_device *dev,
> >  		struct xen_pvcalls_request *req)
> >  {
> > -	return 0;
> > +	struct pvcalls_back_priv *priv;
> > +	int ret;
> > +	struct xen_pvcalls_response *rsp;
> > +
> > +	if (dev == NULL)
> > +		return 0;
> > +	priv = dev_get_drvdata(&dev->dev);
> 
> This is inconsistent with pvcalls_back_event() tests, where you check both for
> NULL. OTOH, I am not sure a check is needed at all since you've just tested
> these in pvcalls_back_event().

That's because priv cannot be NULL at this stage: it was allocated
before queuing any work to priv->wq. While in pvcalls_back_event I have
been more careful in case of spurious (erroneous) notifications.

I agree that I could remove this dev == NULL check here, and in the
other command handlers. I'll do that.


> > +
> > +	if (req->u.socket.domain != AF_INET ||
> > +	    req->u.socket.type != SOCK_STREAM ||
> > +	    (req->u.socket.protocol != 0 &&
> > +	     req->u.socket.protocol != AF_INET))
> > +		ret = -EAFNOSUPPORT;
> > +	else
> > +		ret = 0;
> > +
> > +	/* leave the actual socket allocation for later */
> > +
> > +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> > +	rsp->req_id = req->req_id;
> > +	rsp->cmd = req->cmd;
> > +	rsp->u.socket.id = req->u.socket.id;
> > +	rsp->ret = ret;
> > +
> > +	return 1;
> >  }
> > 
> >  static int pvcalls_back_connect(struct xenbus_device *dev,
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/18] xen/pvcalls: handle commands from the frontend
  2017-05-16  2:06     ` Boris Ostrovsky
@ 2017-05-16 20:57       ` Stefano Stabellini
  2017-05-16 20:57       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:57 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > When the other end notifies us that there are commands to be read
> > (pvcalls_back_event), wake up the backend thread to parse the command.
> > 
> > The command ring works like most other Xen rings, so use the usual
> > ring macros to read and write to it. The functions implementing the
> > commands are empty stubs for now.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 115
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 115 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 876e577..2b2a49a 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct
> > *work)
> >  {
> >  }
> > 
> > +static int pvcalls_back_socket(struct xenbus_device *dev,
> > +		struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_connect(struct xenbus_device *dev,
> > +				struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_release(struct xenbus_device *dev,
> > +				struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_bind(struct xenbus_device *dev,
> > +			     struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_listen(struct xenbus_device *dev,
> > +			       struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_accept(struct xenbus_device *dev,
> > +			       struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_poll(struct xenbus_device *dev,
> > +			     struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
> > +				   struct xen_pvcalls_request *req)
> > +{
> > +	int ret = 0;
> > +
> > +	switch (req->cmd) {
> > +	case PVCALLS_SOCKET:
> > +		ret = pvcalls_back_socket(dev, req);
> > +		break;
> > +	case PVCALLS_CONNECT:
> > +		ret = pvcalls_back_connect(dev, req);
> > +		break;
> > +	case PVCALLS_RELEASE:
> > +		ret = pvcalls_back_release(dev, req);
> > +		break;
> > +	case PVCALLS_BIND:
> > +		ret = pvcalls_back_bind(dev, req);
> > +		break;
> > +	case PVCALLS_LISTEN:
> > +		ret = pvcalls_back_listen(dev, req);
> > +		break;
> > +	case PVCALLS_ACCEPT:
> > +		ret = pvcalls_back_accept(dev, req);
> > +		break;
> > +	case PVCALLS_POLL:
> > +		ret = pvcalls_back_poll(dev, req);
> > +		break;
> > +	default:
> > +		ret = -ENOTSUPP;
> > +		break;
> > +	}
> > +	return ret;
> > +}
> > +
> >  static void pvcalls_back_work(struct work_struct *work)
> >  {
> > +	struct pvcalls_back_priv *priv = container_of(work,
> > +		struct pvcalls_back_priv, register_work);
> > +	int notify, notify_all = 0, more = 1;
> > +	struct xen_pvcalls_request req;
> > +	struct xenbus_device *dev = priv->dev;
> > +
> > +	atomic_set(&priv->work, 1);
> > +
> > +	while (more || !atomic_dec_and_test(&priv->work)) {
> > +		while (RING_HAS_UNCONSUMED_REQUESTS(&priv->ring)) {
> > +			RING_COPY_REQUEST(&priv->ring,
> > +					  priv->ring.req_cons++,
> > +					  &req);
> > +
> > +			if (pvcalls_back_handle_cmd(dev, &req) > 0) {
> 
> Can you make handlers make "traditional" returns, i.e. <0 on error and 0 on
> success? Or do you really need to distinguish 0 from >0?

Today < 0 means error, 0 means OK but no notifications required, 1 means
OK with notifications. Given that errors are returned to the other end
using the appropriate response field (we don't do anything with an error
in pvcalls_back_work), I could change this to:

-1: no need for notifications (both errors and regular conditions)
0:  notifications


> > +				RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
> > +					&priv->ring, notify);
> > +				notify_all += notify;
> > +			}
> > +		}
> > +
> > +		if (notify_all)
> > +			notify_remote_via_irq(priv->irq);
> > +
> > +		RING_FINAL_CHECK_FOR_REQUESTS(&priv->ring, more);
> > +	}
> >  }
> > 
> >  static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
> >  {
> > +	struct xenbus_device *dev = dev_id;
> > +	struct pvcalls_back_priv *priv = NULL;
> > +
> > +	if (dev == NULL)
> > +		return IRQ_HANDLED;
> > +
> > +	priv = dev_get_drvdata(&dev->dev);
> > +	if (priv == NULL)
> > +		return IRQ_HANDLED;
> 
> These two aren't errors?

They are meant to handle spurious event notifications. From the Linux
irq handling subsystem point of view, they are not errors.


> > +
> > +	atomic_inc(&priv->work);
> 
> Is this really needed? We have a new entry on the ring, so the outer loop in
> pvcalls_back_work() will pick this up (by setting 'more').

This is to avoid race conditions. A notification could be delivered
after RING_FINAL_CHECK_FOR_REQUESTS is called, returning more == 0, but
before pvcalls_back_work completes. In that case, without priv->work,
pvcalls_back_work wouldn't be rescheduled because it is still running
and the work would be left undone.


> > +	queue_work(priv->wq, &priv->register_work);
> > +
> >  	return IRQ_HANDLED;
> >  }

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/18] xen/pvcalls: handle commands from the frontend
  2017-05-16  2:06     ` Boris Ostrovsky
  2017-05-16 20:57       ` Stefano Stabellini
@ 2017-05-16 20:57       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 20:57 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:35 PM, Stefano Stabellini wrote:
> > When the other end notifies us that there are commands to be read
> > (pvcalls_back_event), wake up the backend thread to parse the command.
> > 
> > The command ring works like most other Xen rings, so use the usual
> > ring macros to read and write to it. The functions implementing the
> > commands are empty stubs for now.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 115
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 115 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 876e577..2b2a49a 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct
> > *work)
> >  {
> >  }
> > 
> > +static int pvcalls_back_socket(struct xenbus_device *dev,
> > +		struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_connect(struct xenbus_device *dev,
> > +				struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_release(struct xenbus_device *dev,
> > +				struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_bind(struct xenbus_device *dev,
> > +			     struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_listen(struct xenbus_device *dev,
> > +			       struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_accept(struct xenbus_device *dev,
> > +			       struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_poll(struct xenbus_device *dev,
> > +			     struct xen_pvcalls_request *req)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
> > +				   struct xen_pvcalls_request *req)
> > +{
> > +	int ret = 0;
> > +
> > +	switch (req->cmd) {
> > +	case PVCALLS_SOCKET:
> > +		ret = pvcalls_back_socket(dev, req);
> > +		break;
> > +	case PVCALLS_CONNECT:
> > +		ret = pvcalls_back_connect(dev, req);
> > +		break;
> > +	case PVCALLS_RELEASE:
> > +		ret = pvcalls_back_release(dev, req);
> > +		break;
> > +	case PVCALLS_BIND:
> > +		ret = pvcalls_back_bind(dev, req);
> > +		break;
> > +	case PVCALLS_LISTEN:
> > +		ret = pvcalls_back_listen(dev, req);
> > +		break;
> > +	case PVCALLS_ACCEPT:
> > +		ret = pvcalls_back_accept(dev, req);
> > +		break;
> > +	case PVCALLS_POLL:
> > +		ret = pvcalls_back_poll(dev, req);
> > +		break;
> > +	default:
> > +		ret = -ENOTSUPP;
> > +		break;
> > +	}
> > +	return ret;
> > +}
> > +
> >  static void pvcalls_back_work(struct work_struct *work)
> >  {
> > +	struct pvcalls_back_priv *priv = container_of(work,
> > +		struct pvcalls_back_priv, register_work);
> > +	int notify, notify_all = 0, more = 1;
> > +	struct xen_pvcalls_request req;
> > +	struct xenbus_device *dev = priv->dev;
> > +
> > +	atomic_set(&priv->work, 1);
> > +
> > +	while (more || !atomic_dec_and_test(&priv->work)) {
> > +		while (RING_HAS_UNCONSUMED_REQUESTS(&priv->ring)) {
> > +			RING_COPY_REQUEST(&priv->ring,
> > +					  priv->ring.req_cons++,
> > +					  &req);
> > +
> > +			if (pvcalls_back_handle_cmd(dev, &req) > 0) {
> 
> Can you make handlers make "traditional" returns, i.e. <0 on error and 0 on
> success? Or do you really need to distinguish 0 from >0?

Today < 0 means error, 0 means OK but no notifications required, 1 means
OK with notifications. Given that errors are returned to the other end
using the appropriate response field (we don't do anything with an error
in pvcalls_back_work), I could change this to:

-1: no need for notifications (both errors and regular conditions)
0:  notifications


> > +				RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
> > +					&priv->ring, notify);
> > +				notify_all += notify;
> > +			}
> > +		}
> > +
> > +		if (notify_all)
> > +			notify_remote_via_irq(priv->irq);
> > +
> > +		RING_FINAL_CHECK_FOR_REQUESTS(&priv->ring, more);
> > +	}
> >  }
> > 
> >  static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
> >  {
> > +	struct xenbus_device *dev = dev_id;
> > +	struct pvcalls_back_priv *priv = NULL;
> > +
> > +	if (dev == NULL)
> > +		return IRQ_HANDLED;
> > +
> > +	priv = dev_get_drvdata(&dev->dev);
> > +	if (priv == NULL)
> > +		return IRQ_HANDLED;
> 
> These two aren't errors?

They are meant to handle spurious event notifications. From the Linux
irq handling subsystem point of view, they are not errors.


> > +
> > +	atomic_inc(&priv->work);
> 
> Is this really needed? We have a new entry on the ring, so the outer loop in
> pvcalls_back_work() will pick this up (by setting 'more').

This is to avoid race conditions. A notification could be delivered
after RING_FINAL_CHECK_FOR_REQUESTS is called, returning more == 0, but
before pvcalls_back_work completes. In that case, without priv->work,
pvcalls_back_work wouldn't be rescheduled because it is still running
and the work would be left undone.


> > +	queue_work(priv->wq, &priv->register_work);
> > +
> >  	return IRQ_HANDLED;
> >  }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-16  2:36     ` Boris Ostrovsky
  2017-05-16 21:02       ` Stefano Stabellini
@ 2017-05-16 21:02       ` Stefano Stabellini
  2017-05-16 21:56         ` Boris Ostrovsky
  2017-05-16 21:56         ` Boris Ostrovsky
  1 sibling, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 21:02 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:36 PM, Stefano Stabellini wrote:
> > Allocate a socket. Keep track of socket <-> ring mappings with a new data
> > structure, called sock_mapping. Implement the connect command by calling
> > inet_stream_connect, and mapping the new indexes page and data ring.
> > Associate the socket to an ioworker randomly.
> > 
> > When an active socket is closed (sk_state_change), set in_error to
> > -ENOTCONN and notify the other end, as specified by the protocol.
> > 
> > sk_data_ready will be implemented later.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 145
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 145 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2eae096..9ac1cf2 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -63,6 +63,29 @@ struct pvcalls_back_priv {
> >  	struct work_struct register_work;
> >  };
> > 
> > +struct sock_mapping {
> > +	struct list_head list;
> > +	struct list_head queue;
> 
> Since you have two lists it would be helpful if names were a bit more
> descriptive.
> 
> (and comments for at least some fields would be welcome too)

Yeah, you are right. list is used to add sock_mapping to
priv->socket_mappings, the per frontend list of active sockets. queue is
used to add sock_mapping to the ioworker list. I'll add a comment.


> > +	struct pvcalls_back_priv *priv;
> > +	struct socket *sock;
> > +	int data_worker;
> > +	uint64_t id;
> > +	grant_ref_t ref;
> > +	struct pvcalls_data_intf *ring;
> > +	void *bytes;
> > +	struct pvcalls_data data;
> > +	uint32_t ring_order;
> > +	int irq;
> > +	atomic_t read;
> > +	atomic_t write;
> > +	atomic_t release;
> > +	void (*saved_data_ready)(struct sock *sk);
> > +};
> > +
> > +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
> > +static int pvcalls_back_release_active(struct xenbus_device *dev,
> > +				       struct pvcalls_back_priv *priv,
> > +				       struct sock_mapping *map);
> >  static void pvcalls_back_ioworker(struct work_struct *work)
> >  {
> >  }
> > @@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device
> > *dev,
> >  	return 1;
> >  }
> > 
> > +static void pvcalls_sk_state_change(struct sock *sock)
> > +{
> > +	struct sock_mapping *map = sock->sk_user_data;
> > +	struct pvcalls_data_intf *intf;
> > +
> > +	if (map == NULL)
> > +		return;
> > +
> > +	intf = map->ring;
> > +	intf->in_error = -ENOTCONN;
> > +	notify_remote_via_irq(map->irq);
> > +}
> > +
> > +static void pvcalls_sk_data_ready(struct sock *sock)
> > +{
> > +}
> > +
> >  static int pvcalls_back_connect(struct xenbus_device *dev,
> >  				struct xen_pvcalls_request *req)
> >  {
> > +	struct pvcalls_back_priv *priv;
> > +	int ret;
> > +	struct socket *sock;
> > +	struct sock_mapping *map = NULL;
> > +	void *page;
> > +	struct xen_pvcalls_response *rsp;
> > +
> > +	if (dev == NULL)
> > +		return 0;
> > +	priv = dev_get_drvdata(&dev->dev);
> > +
> > +	map = kzalloc(sizeof(*map), GFP_KERNEL);
> > +	if (map == NULL) {
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
> > +	if (ret < 0) {
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	INIT_LIST_HEAD(&map->queue);
> > +	map->data_worker = get_random_int() %
> > pvcalls_back_global.nr_ioworkers;
> > +
> > +	map->priv = priv;
> > +	map->sock = sock;
> > +	map->id = req->u.connect.id;
> > +	map->ref = req->u.connect.ref;
> > +
> > +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
> > +	if (ret < 0) {
> > +		sock_release(map->sock);
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	map->ring = page;
> > +	map->ring_order = map->ring->ring_order;
> > +	/* first read the order, then map the data ring */
> > +	virt_rmb();
> 
> 
> Not sure I understand what the barrier is for here. I don't think compiler
> will reorder ring_order access with the call.

It's to avoid using the live version of ring_order to map the data ring
pages (the other end could be changing that value at any time). We want
to be sure that the compiler doesn't optimize out map->ring_order and
use map->ring->ring_order instead.


> > +	if (map->ring_order > MAX_RING_ORDER) {
> > +		ret = -EFAULT;
> > +		goto out;
> > +	}
> 
> If the barrier is indeed needed this check belongs before it.

I don't think so, see above.


> 
> 
> > +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
> > +				     (1 << map->ring_order), &page);
> > +	if (ret < 0) {
> > +		sock_release(map->sock);
> > +		xenbus_unmap_ring_vfree(dev, map->ring);
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	map->bytes = page;
> > +
> > +	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
> > +						    req->u.connect.evtchn,
> > +						    pvcalls_back_conn_event,
> > +						    0,
> > +						    "pvcalls-backend",
> > +						    map);
> > +	if (ret < 0) {
> > +		sock_release(map->sock);
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	map->irq = ret;
> > +
> > +	map->data.in = map->bytes;
> > +	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
> > +
> > +	down_write(&priv->pvcallss_lock);
> > +	list_add_tail(&map->list, &priv->socket_mappings);
> > +	up_write(&priv->pvcallss_lock);
> > +
> > +	ret = inet_stream_connect(sock, (struct sockaddr
> > *)&req->u.connect.addr,
> > +				  req->u.connect.len, req->u.connect.flags);
> > +	if (ret < 0) {
> > +		pvcalls_back_release_active(dev, priv, map);
> > +	} else {
> > +		lock_sock(sock->sk);
> > +		map->saved_data_ready = sock->sk->sk_data_ready;
> > +		sock->sk->sk_user_data = map;
> > +		sock->sk->sk_data_ready = pvcalls_sk_data_ready;
> > +		sock->sk->sk_state_change = pvcalls_sk_state_change;
> > +		release_sock(sock->sk);
> > +	}
> > +
> > +out:
> > +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> > +	rsp->req_id = req->req_id;
> > +	rsp->cmd = req->cmd;
> > +	rsp->u.connect.id = req->u.connect.id;
> > +	rsp->ret = ret;
> > +
> > +	return 1;
> > +}
> > +
> > +static int pvcalls_back_release_active(struct xenbus_device *dev,
> > +				       struct pvcalls_back_priv *priv,
> > +				       struct sock_mapping *map)
> > +{
> >  	return 0;
> >  }
> > 
> > @@ -215,6 +355,11 @@ static irqreturn_t pvcalls_back_event(int irq, void
> > *dev_id)
> >  	return IRQ_HANDLED;
> >  }
> > 
> > +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
> > +{
> > +	return IRQ_HANDLED;
> > +}
> > +
> >  static int backend_connect(struct xenbus_device *dev)
> >  {
> >  	int err, evtchn;
> > 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-16  2:36     ` Boris Ostrovsky
@ 2017-05-16 21:02       ` Stefano Stabellini
  2017-05-16 21:02       ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-16 21:02 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Mon, 15 May 2017, Boris Ostrovsky wrote:
> On 05/15/2017 04:36 PM, Stefano Stabellini wrote:
> > Allocate a socket. Keep track of socket <-> ring mappings with a new data
> > structure, called sock_mapping. Implement the connect command by calling
> > inet_stream_connect, and mapping the new indexes page and data ring.
> > Associate the socket to an ioworker randomly.
> > 
> > When an active socket is closed (sk_state_change), set in_error to
> > -ENOTCONN and notify the other end, as specified by the protocol.
> > 
> > sk_data_ready will be implemented later.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 145
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 145 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 2eae096..9ac1cf2 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -63,6 +63,29 @@ struct pvcalls_back_priv {
> >  	struct work_struct register_work;
> >  };
> > 
> > +struct sock_mapping {
> > +	struct list_head list;
> > +	struct list_head queue;
> 
> Since you have two lists it would be helpful if names were a bit more
> descriptive.
> 
> (and comments for at least some fields would be welcome too)

Yeah, you are right. list is used to add sock_mapping to
priv->socket_mappings, the per frontend list of active sockets. queue is
used to add sock_mapping to the ioworker list. I'll add a comment.


> > +	struct pvcalls_back_priv *priv;
> > +	struct socket *sock;
> > +	int data_worker;
> > +	uint64_t id;
> > +	grant_ref_t ref;
> > +	struct pvcalls_data_intf *ring;
> > +	void *bytes;
> > +	struct pvcalls_data data;
> > +	uint32_t ring_order;
> > +	int irq;
> > +	atomic_t read;
> > +	atomic_t write;
> > +	atomic_t release;
> > +	void (*saved_data_ready)(struct sock *sk);
> > +};
> > +
> > +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
> > +static int pvcalls_back_release_active(struct xenbus_device *dev,
> > +				       struct pvcalls_back_priv *priv,
> > +				       struct sock_mapping *map);
> >  static void pvcalls_back_ioworker(struct work_struct *work)
> >  {
> >  }
> > @@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device
> > *dev,
> >  	return 1;
> >  }
> > 
> > +static void pvcalls_sk_state_change(struct sock *sock)
> > +{
> > +	struct sock_mapping *map = sock->sk_user_data;
> > +	struct pvcalls_data_intf *intf;
> > +
> > +	if (map == NULL)
> > +		return;
> > +
> > +	intf = map->ring;
> > +	intf->in_error = -ENOTCONN;
> > +	notify_remote_via_irq(map->irq);
> > +}
> > +
> > +static void pvcalls_sk_data_ready(struct sock *sock)
> > +{
> > +}
> > +
> >  static int pvcalls_back_connect(struct xenbus_device *dev,
> >  				struct xen_pvcalls_request *req)
> >  {
> > +	struct pvcalls_back_priv *priv;
> > +	int ret;
> > +	struct socket *sock;
> > +	struct sock_mapping *map = NULL;
> > +	void *page;
> > +	struct xen_pvcalls_response *rsp;
> > +
> > +	if (dev == NULL)
> > +		return 0;
> > +	priv = dev_get_drvdata(&dev->dev);
> > +
> > +	map = kzalloc(sizeof(*map), GFP_KERNEL);
> > +	if (map == NULL) {
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
> > +	if (ret < 0) {
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	INIT_LIST_HEAD(&map->queue);
> > +	map->data_worker = get_random_int() %
> > pvcalls_back_global.nr_ioworkers;
> > +
> > +	map->priv = priv;
> > +	map->sock = sock;
> > +	map->id = req->u.connect.id;
> > +	map->ref = req->u.connect.ref;
> > +
> > +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
> > +	if (ret < 0) {
> > +		sock_release(map->sock);
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	map->ring = page;
> > +	map->ring_order = map->ring->ring_order;
> > +	/* first read the order, then map the data ring */
> > +	virt_rmb();
> 
> 
> Not sure I understand what the barrier is for here. I don't think compiler
> will reorder ring_order access with the call.

It's to avoid using the live version of ring_order to map the data ring
pages (the other end could be changing that value at any time). We want
to be sure that the compiler doesn't optimize out map->ring_order and
use map->ring->ring_order instead.


> > +	if (map->ring_order > MAX_RING_ORDER) {
> > +		ret = -EFAULT;
> > +		goto out;
> > +	}
> 
> If the barrier is indeed needed this check belongs before it.

I don't think so, see above.


> 
> 
> > +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
> > +				     (1 << map->ring_order), &page);
> > +	if (ret < 0) {
> > +		sock_release(map->sock);
> > +		xenbus_unmap_ring_vfree(dev, map->ring);
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	map->bytes = page;
> > +
> > +	ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
> > +						    req->u.connect.evtchn,
> > +						    pvcalls_back_conn_event,
> > +						    0,
> > +						    "pvcalls-backend",
> > +						    map);
> > +	if (ret < 0) {
> > +		sock_release(map->sock);
> > +		kfree(map);
> > +		goto out;
> > +	}
> > +	map->irq = ret;
> > +
> > +	map->data.in = map->bytes;
> > +	map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
> > +
> > +	down_write(&priv->pvcallss_lock);
> > +	list_add_tail(&map->list, &priv->socket_mappings);
> > +	up_write(&priv->pvcallss_lock);
> > +
> > +	ret = inet_stream_connect(sock, (struct sockaddr
> > *)&req->u.connect.addr,
> > +				  req->u.connect.len, req->u.connect.flags);
> > +	if (ret < 0) {
> > +		pvcalls_back_release_active(dev, priv, map);
> > +	} else {
> > +		lock_sock(sock->sk);
> > +		map->saved_data_ready = sock->sk->sk_data_ready;
> > +		sock->sk->sk_user_data = map;
> > +		sock->sk->sk_data_ready = pvcalls_sk_data_ready;
> > +		sock->sk->sk_state_change = pvcalls_sk_state_change;
> > +		release_sock(sock->sk);
> > +	}
> > +
> > +out:
> > +	rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++);
> > +	rsp->req_id = req->req_id;
> > +	rsp->cmd = req->cmd;
> > +	rsp->u.connect.id = req->u.connect.id;
> > +	rsp->ret = ret;
> > +
> > +	return 1;
> > +}
> > +
> > +static int pvcalls_back_release_active(struct xenbus_device *dev,
> > +				       struct pvcalls_back_priv *priv,
> > +				       struct sock_mapping *map)
> > +{
> >  	return 0;
> >  }
> > 
> > @@ -215,6 +355,11 @@ static irqreturn_t pvcalls_back_event(int irq, void
> > *dev_id)
> >  	return IRQ_HANDLED;
> >  }
> > 
> > +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
> > +{
> > +	return IRQ_HANDLED;
> > +}
> > +
> >  static int backend_connect(struct xenbus_device *dev)
> >  {
> >  	int err, evtchn;
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-16 21:02       ` Stefano Stabellini
  2017-05-16 21:56         ` Boris Ostrovsky
@ 2017-05-16 21:56         ` Boris Ostrovsky
  2017-05-18 19:10           ` Stefano Stabellini
  2017-05-18 19:10           ` Stefano Stabellini
  1 sibling, 2 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16 21:56 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini


>>> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
>>> +	if (ret < 0) {
>>> +		sock_release(map->sock);
>>> +		kfree(map);
>>> +		goto out;
>>> +	}
>>> +	map->ring = page;
>>> +	map->ring_order = map->ring->ring_order;
>>> +	/* first read the order, then map the data ring */
>>> +	virt_rmb();
>>
>> Not sure I understand what the barrier is for here. I don't think compiler
>> will reorder ring_order access with the call.
> It's to avoid using the live version of ring_order to map the data ring
> pages (the other end could be changing that value at any time). We want
> to be sure that the compiler doesn't optimize out map->ring_order and
> use map->ring->ring_order instead.

Wouldn't WRITE_ONCE(map->ring_order, map->ring->ring_order) be the right
primitive then?

And also: if the other side changes ring size, what are we mapping then?
It's obsolete by now.

-boris

>
>
>>> +	if (map->ring_order > MAX_RING_ORDER) {
>>> +		ret = -EFAULT;
>>> +		goto out;
>>> +	}
>> If the barrier is indeed needed this check belongs before it.
> I don't think so, see above.
>
>
>>
>>> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
>>> +				     (1 << map->ring_order), &page);
>>> +	if (ret < 0) {
>>> +		sock_release(map->sock);
>>> +		xenbus_unmap_ring_vfree(dev, map->ring);
>>> +		kfree(map);
>>> +		goto out;
>>> +	}
>>> +	map->bytes = page;
>>>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-16 21:02       ` Stefano Stabellini
@ 2017-05-16 21:56         ` Boris Ostrovsky
  2017-05-16 21:56         ` Boris Ostrovsky
  1 sibling, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-16 21:56 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel


>>> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
>>> +	if (ret < 0) {
>>> +		sock_release(map->sock);
>>> +		kfree(map);
>>> +		goto out;
>>> +	}
>>> +	map->ring = page;
>>> +	map->ring_order = map->ring->ring_order;
>>> +	/* first read the order, then map the data ring */
>>> +	virt_rmb();
>>
>> Not sure I understand what the barrier is for here. I don't think compiler
>> will reorder ring_order access with the call.
> It's to avoid using the live version of ring_order to map the data ring
> pages (the other end could be changing that value at any time). We want
> to be sure that the compiler doesn't optimize out map->ring_order and
> use map->ring->ring_order instead.

Wouldn't WRITE_ONCE(map->ring_order, map->ring->ring_order) be the right
primitive then?

And also: if the other side changes ring size, what are we mapping then?
It's obsolete by now.

-boris

>
>
>>> +	if (map->ring_order > MAX_RING_ORDER) {
>>> +		ret = -EFAULT;
>>> +		goto out;
>>> +	}
>> If the barrier is indeed needed this check belongs before it.
> I don't think so, see above.
>
>
>>
>>> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
>>> +				     (1 << map->ring_order), &page);
>>> +	if (ret < 0) {
>>> +		sock_release(map->sock);
>>> +		xenbus_unmap_ring_vfree(dev, map->ring);
>>> +		kfree(map);
>>> +		goto out;
>>> +	}
>>> +	map->bytes = page;
>>>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16 19:58       ` Stefano Stabellini
  2017-05-17  5:21         ` Juergen Gross
@ 2017-05-17  5:21         ` Juergen Gross
  2017-05-18 21:18           ` Stefano Stabellini
  2017-05-18 21:18           ` Stefano Stabellini
  1 sibling, 2 replies; 81+ messages in thread
From: Juergen Gross @ 2017-05-17  5:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, Stefano Stabellini

On 16/05/17 21:58, Stefano Stabellini wrote:
> On Tue, 16 May 2017, Juergen Gross wrote:
>> On 15/05/17 22:35, Stefano Stabellini wrote:
>>> The pvcalls backend has one ioworker per cpu: the ioworkers are
>>> implemented as a cpu bound workqueue, and will deal with the actual
>>> socket and data ring reads/writes.
>>>
>>> ioworkers are global: we only have one set for all the frontends. They
>>> process requests on their wqs list in order, once they are done with a
>>> request, they'll remove it from the list. A spinlock is used for
>>> protecting the list. Each ioworker is bound to a different cpu to
>>> maximize throughput.
>>>
>>> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
>>> CC: boris.ostrovsky@oracle.com
>>> CC: jgross@suse.com
>>> ---
>>>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 64 insertions(+)
>>>
>>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
>>> index 2dbf7d8..46a889a 100644
>>> --- a/drivers/xen/pvcalls-back.c
>>> +++ b/drivers/xen/pvcalls-back.c
>>> @@ -25,6 +25,26 @@
>>>  #include <xen/xenbus.h>
>>>  #include <xen/interface/io/pvcalls.h>
>>>  
>>> +struct pvcalls_ioworker {
>>> +	struct work_struct register_work;
>>> +	atomic_t io;
>>> +	struct list_head wqs;
>>> +	spinlock_t lock;
>>> +	int num;
>>> +};
>>> +
>>> +struct pvcalls_back_global {
>>> +	struct pvcalls_ioworker *ioworkers;
>>> +	int nr_ioworkers;
>>> +	struct workqueue_struct *wq;
>>> +	struct list_head privs;
>>> +	struct rw_semaphore privs_lock;
>>> +} pvcalls_back_global;
>>> +
>>> +static void pvcalls_back_ioworker(struct work_struct *work)
>>> +{
>>> +}
>>> +
>>>  static int pvcalls_back_probe(struct xenbus_device *dev,
>>>  			      const struct xenbus_device_id *id)
>>>  {
>>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>>>  	.uevent = pvcalls_back_uevent,
>>>  	.otherend_changed = pvcalls_back_changed,
>>>  };
>>> +
>>> +static int __init pvcalls_back_init(void)
>>> +{
>>> +	int ret, i, cpu;
>>> +
>>> +	if (!xen_domain())
>>> +		return -ENODEV;
>>> +
>>> +	ret = xenbus_register_backend(&pvcalls_back_driver);
>>> +	if (ret < 0)
>>> +		return ret;
>>> +
>>> +	init_rwsem(&pvcalls_back_global.privs_lock);
>>> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
>>> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
>>> +	if (!pvcalls_back_global.wq)
>>> +		goto error;
>>> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
>>
>> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
>> we want 640 workers initialized when loading the backend module. I'd
>> prefer one or a few workers per connected frontend.
> 
> I think we want to keep the ioworker allocation to be based on the
> number of vcpus: we do not want more ioworkers than vcpus because it is
> a waste of resources and leads to worse performance.  Also, given that
> they do memcpy's, I also think it is a good idea to bind them to vcpus
> (and pin vcpus to pcpus) to get best performance.

This will cause a lot of pain for the cpu offline case. Please don't try
to work against the hypervisor scheduler by designing a backend based on
a vcpu pin policy. This might result in best performance for your
special workload, but generally it is a bad idea!

> However, you have a point there: we need to handle systems with an
> extremely large number of Dom0 vcpus. I suggest we introduce an
> upper limit for the number of ioworkers. Something like:
> 
> #define MAX_IOWORKERS 64
> nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())
> 
> MAX_IOWORKERS could be configurable via a command line option.

Later you are assigning each active socket to exactly one ioworker.
Wouldn't it make more sense to allocate the ioworker when doing
the connect? This would avoid the problem of having only a statistical
distribution, possibly with all sockets on the same ioworker.

Basically you are re-inventing the wheel by using an own workqueue
implementation in each ioworker looping through all assigned sockets.


Juergen

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-16 19:58       ` Stefano Stabellini
@ 2017-05-17  5:21         ` Juergen Gross
  2017-05-17  5:21         ` Juergen Gross
  1 sibling, 0 replies; 81+ messages in thread
From: Juergen Gross @ 2017-05-17  5:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Stefano Stabellini, boris.ostrovsky, linux-kernel, xen-devel

On 16/05/17 21:58, Stefano Stabellini wrote:
> On Tue, 16 May 2017, Juergen Gross wrote:
>> On 15/05/17 22:35, Stefano Stabellini wrote:
>>> The pvcalls backend has one ioworker per cpu: the ioworkers are
>>> implemented as a cpu bound workqueue, and will deal with the actual
>>> socket and data ring reads/writes.
>>>
>>> ioworkers are global: we only have one set for all the frontends. They
>>> process requests on their wqs list in order, once they are done with a
>>> request, they'll remove it from the list. A spinlock is used for
>>> protecting the list. Each ioworker is bound to a different cpu to
>>> maximize throughput.
>>>
>>> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
>>> CC: boris.ostrovsky@oracle.com
>>> CC: jgross@suse.com
>>> ---
>>>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 64 insertions(+)
>>>
>>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
>>> index 2dbf7d8..46a889a 100644
>>> --- a/drivers/xen/pvcalls-back.c
>>> +++ b/drivers/xen/pvcalls-back.c
>>> @@ -25,6 +25,26 @@
>>>  #include <xen/xenbus.h>
>>>  #include <xen/interface/io/pvcalls.h>
>>>  
>>> +struct pvcalls_ioworker {
>>> +	struct work_struct register_work;
>>> +	atomic_t io;
>>> +	struct list_head wqs;
>>> +	spinlock_t lock;
>>> +	int num;
>>> +};
>>> +
>>> +struct pvcalls_back_global {
>>> +	struct pvcalls_ioworker *ioworkers;
>>> +	int nr_ioworkers;
>>> +	struct workqueue_struct *wq;
>>> +	struct list_head privs;
>>> +	struct rw_semaphore privs_lock;
>>> +} pvcalls_back_global;
>>> +
>>> +static void pvcalls_back_ioworker(struct work_struct *work)
>>> +{
>>> +}
>>> +
>>>  static int pvcalls_back_probe(struct xenbus_device *dev,
>>>  			      const struct xenbus_device_id *id)
>>>  {
>>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>>>  	.uevent = pvcalls_back_uevent,
>>>  	.otherend_changed = pvcalls_back_changed,
>>>  };
>>> +
>>> +static int __init pvcalls_back_init(void)
>>> +{
>>> +	int ret, i, cpu;
>>> +
>>> +	if (!xen_domain())
>>> +		return -ENODEV;
>>> +
>>> +	ret = xenbus_register_backend(&pvcalls_back_driver);
>>> +	if (ret < 0)
>>> +		return ret;
>>> +
>>> +	init_rwsem(&pvcalls_back_global.privs_lock);
>>> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
>>> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
>>> +	if (!pvcalls_back_global.wq)
>>> +		goto error;
>>> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
>>
>> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
>> we want 640 workers initialized when loading the backend module. I'd
>> prefer one or a few workers per connected frontend.
> 
> I think we want to keep the ioworker allocation to be based on the
> number of vcpus: we do not want more ioworkers than vcpus because it is
> a waste of resources and leads to worse performance.  Also, given that
> they do memcpy's, I also think it is a good idea to bind them to vcpus
> (and pin vcpus to pcpus) to get best performance.

This will cause a lot of pain for the cpu offline case. Please don't try
to work against the hypervisor scheduler by designing a backend based on
a vcpu pin policy. This might result in best performance for your
special workload, but generally it is a bad idea!

> However, you have a point there: we need to handle systems with an
> extremely large number of Dom0 vcpus. I suggest we introduce an
> upper limit for the number of ioworkers. Something like:
> 
> #define MAX_IOWORKERS 64
> nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())
> 
> MAX_IOWORKERS could be configurable via a command line option.

Later you are assigning each active socket to exactly one ioworker.
Wouldn't it make more sense to allocate the ioworker when doing
the connect? This would avoid the problem of having only a statistical
distribution, possibly with all sockets on the same ioworker.

Basically you are re-inventing the wheel by using an own workqueue
implementation in each ioworker looping through all assigned sockets.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-16 21:56         ` Boris Ostrovsky
@ 2017-05-18 19:10           ` Stefano Stabellini
  2017-05-18 20:19             ` Boris Ostrovsky
  2017-05-18 20:19             ` Boris Ostrovsky
  2017-05-18 19:10           ` Stefano Stabellini
  1 sibling, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-18 19:10 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 16 May 2017, Boris Ostrovsky wrote:
> >>> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
> >>> +	if (ret < 0) {
> >>> +		sock_release(map->sock);
> >>> +		kfree(map);
> >>> +		goto out;
> >>> +	}
> >>> +	map->ring = page;
> >>> +	map->ring_order = map->ring->ring_order;
> >>> +	/* first read the order, then map the data ring */
> >>> +	virt_rmb();
> >>
> >> Not sure I understand what the barrier is for here. I don't think compiler
> >> will reorder ring_order access with the call.
> > It's to avoid using the live version of ring_order to map the data ring
> > pages (the other end could be changing that value at any time). We want
> > to be sure that the compiler doesn't optimize out map->ring_order and
> > use map->ring->ring_order instead.
> 
> Wouldn't WRITE_ONCE(map->ring_order, map->ring->ring_order) be the right
> primitive then?

It doesn't have to be atomic, because right after the assignment we
check if map->ring_order is an appropriate value (see below).


> And also: if the other side changes ring size, what are we mapping then?
> It's obsolete by now.

If the grants are wrong, the mapping hypercalls will fail, the same way
they do with any of the other PV frontends/backends today. That is not
the problem we are trying to address with the barrier.

The issue is here is that by runtime changes to map->ring->ring_order,
the frontend could issue a denial of service by getting the backend into
a busyloop. You can imagine that:

  for (i = 0; i < map->ring->ring_order; i++) {

might not work as the backend expects if map->ring->ring_order can
change at any time.

One could say that the code is already written this way:

  for (i = 0; i < map->ring_order; i++) {

So what's the problem? We have seen instances in the past of the
compiler "optimizing" things in a way that actually the assembly did:

  for (i = 0; i < map->ring->ring_order; i++) {

This is why I put a barrier there, to avoid such compiler
"optimizations". Does it make sense?


> >>> +	if (map->ring_order > MAX_RING_ORDER) {
> >>> +		ret = -EFAULT;
> >>> +		goto out;
> >>> +	}
> >> If the barrier is indeed needed this check belongs before it.
> > I don't think so, see above.
> >
> >
> >>
> >>> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
> >>> +				     (1 << map->ring_order), &page);
> >>> +	if (ret < 0) {
> >>> +		sock_release(map->sock);
> >>> +		xenbus_unmap_ring_vfree(dev, map->ring);
> >>> +		kfree(map);
> >>> +		goto out;
> >>> +	}
> >>> +	map->bytes = page;
> >>>
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-16 21:56         ` Boris Ostrovsky
  2017-05-18 19:10           ` Stefano Stabellini
@ 2017-05-18 19:10           ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-18 19:10 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 16 May 2017, Boris Ostrovsky wrote:
> >>> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
> >>> +	if (ret < 0) {
> >>> +		sock_release(map->sock);
> >>> +		kfree(map);
> >>> +		goto out;
> >>> +	}
> >>> +	map->ring = page;
> >>> +	map->ring_order = map->ring->ring_order;
> >>> +	/* first read the order, then map the data ring */
> >>> +	virt_rmb();
> >>
> >> Not sure I understand what the barrier is for here. I don't think compiler
> >> will reorder ring_order access with the call.
> > It's to avoid using the live version of ring_order to map the data ring
> > pages (the other end could be changing that value at any time). We want
> > to be sure that the compiler doesn't optimize out map->ring_order and
> > use map->ring->ring_order instead.
> 
> Wouldn't WRITE_ONCE(map->ring_order, map->ring->ring_order) be the right
> primitive then?

It doesn't have to be atomic, because right after the assignment we
check if map->ring_order is an appropriate value (see below).


> And also: if the other side changes ring size, what are we mapping then?
> It's obsolete by now.

If the grants are wrong, the mapping hypercalls will fail, the same way
they do with any of the other PV frontends/backends today. That is not
the problem we are trying to address with the barrier.

The issue is here is that by runtime changes to map->ring->ring_order,
the frontend could issue a denial of service by getting the backend into
a busyloop. You can imagine that:

  for (i = 0; i < map->ring->ring_order; i++) {

might not work as the backend expects if map->ring->ring_order can
change at any time.

One could say that the code is already written this way:

  for (i = 0; i < map->ring_order; i++) {

So what's the problem? We have seen instances in the past of the
compiler "optimizing" things in a way that actually the assembly did:

  for (i = 0; i < map->ring->ring_order; i++) {

This is why I put a barrier there, to avoid such compiler
"optimizations". Does it make sense?


> >>> +	if (map->ring_order > MAX_RING_ORDER) {
> >>> +		ret = -EFAULT;
> >>> +		goto out;
> >>> +	}
> >> If the barrier is indeed needed this check belongs before it.
> > I don't think so, see above.
> >
> >
> >>
> >>> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
> >>> +				     (1 << map->ring_order), &page);
> >>> +	if (ret < 0) {
> >>> +		sock_release(map->sock);
> >>> +		xenbus_unmap_ring_vfree(dev, map->ring);
> >>> +		kfree(map);
> >>> +		goto out;
> >>> +	}
> >>> +	map->bytes = page;
> >>>
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-18 19:10           ` Stefano Stabellini
  2017-05-18 20:19             ` Boris Ostrovsky
@ 2017-05-18 20:19             ` Boris Ostrovsky
  1 sibling, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-18 20:19 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini

On 05/18/2017 03:10 PM, Stefano Stabellini wrote:
> On Tue, 16 May 2017, Boris Ostrovsky wrote:
>>>>> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
>>>>> +	if (ret < 0) {
>>>>> +		sock_release(map->sock);
>>>>> +		kfree(map);
>>>>> +		goto out;
>>>>> +	}
>>>>> +	map->ring = page;
>>>>> +	map->ring_order = map->ring->ring_order;
>>>>> +	/* first read the order, then map the data ring */
>>>>> +	virt_rmb();
>>>> Not sure I understand what the barrier is for here. I don't think compiler
>>>> will reorder ring_order access with the call.
>>> It's to avoid using the live version of ring_order to map the data ring
>>> pages (the other end could be changing that value at any time). We want
>>> to be sure that the compiler doesn't optimize out map->ring_order and
>>> use map->ring->ring_order instead.
>> Wouldn't WRITE_ONCE(map->ring_order, map->ring->ring_order) be the right
>> primitive then?
> It doesn't have to be atomic, because right after the assignment we
> check if map->ring_order is an appropriate value (see below).

WRITE_ONCE() is not about atomicity, it's about not allowing compilers
get too aggressive.

>
>
>> And also: if the other side changes ring size, what are we mapping then?
>> It's obsolete by now.
> If the grants are wrong, the mapping hypercalls will fail, the same way
> they do with any of the other PV frontends/backends today. That is not
> the problem we are trying to address with the barrier.
>
> The issue is here is that by runtime changes to map->ring->ring_order,
> the frontend could issue a denial of service by getting the backend into
> a busyloop. You can imagine that:
>
>   for (i = 0; i < map->ring->ring_order; i++) {
>
> might not work as the backend expects if map->ring->ring_order can
> change at any time.
>
> One could say that the code is already written this way:
>
>   for (i = 0; i < map->ring_order; i++) {
>
> So what's the problem? We have seen instances in the past of the
> compiler "optimizing" things in a way that actually the assembly did:
>
>   for (i = 0; i < map->ring->ring_order; i++) {
>
> This is why I put a barrier there, to avoid such compiler
> "optimizations". Does it make sense?

Right, I understand all this. I thought you meant that changing
ring_order was part of normal operation (i.e. somewhat expected) and I
couldn't see how that would work.

Thanks for taking time to write this down.

-boris

>
>
>>>>> +	if (map->ring_order > MAX_RING_ORDER) {
>>>>> +		ret = -EFAULT;
>>>>> +		goto out;
>>>>> +	}
>>>> If the barrier is indeed needed this check belongs before it.
>>> I don't think so, see above.
>>>
>>>
>>>>> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
>>>>> +				     (1 << map->ring_order), &page);
>>>>> +	if (ret < 0) {
>>>>> +		sock_release(map->sock);
>>>>> +		xenbus_unmap_ring_vfree(dev, map->ring);
>>>>> +		kfree(map);
>>>>> +		goto out;
>>>>> +	}
>>>>> +	map->bytes = page;
>>>>>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/18] xen/pvcalls: implement connect command
  2017-05-18 19:10           ` Stefano Stabellini
@ 2017-05-18 20:19             ` Boris Ostrovsky
  2017-05-18 20:19             ` Boris Ostrovsky
  1 sibling, 0 replies; 81+ messages in thread
From: Boris Ostrovsky @ 2017-05-18 20:19 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel

On 05/18/2017 03:10 PM, Stefano Stabellini wrote:
> On Tue, 16 May 2017, Boris Ostrovsky wrote:
>>>>> +	ret = xenbus_map_ring_valloc(dev, &req->u.connect.ref, 1, &page);
>>>>> +	if (ret < 0) {
>>>>> +		sock_release(map->sock);
>>>>> +		kfree(map);
>>>>> +		goto out;
>>>>> +	}
>>>>> +	map->ring = page;
>>>>> +	map->ring_order = map->ring->ring_order;
>>>>> +	/* first read the order, then map the data ring */
>>>>> +	virt_rmb();
>>>> Not sure I understand what the barrier is for here. I don't think compiler
>>>> will reorder ring_order access with the call.
>>> It's to avoid using the live version of ring_order to map the data ring
>>> pages (the other end could be changing that value at any time). We want
>>> to be sure that the compiler doesn't optimize out map->ring_order and
>>> use map->ring->ring_order instead.
>> Wouldn't WRITE_ONCE(map->ring_order, map->ring->ring_order) be the right
>> primitive then?
> It doesn't have to be atomic, because right after the assignment we
> check if map->ring_order is an appropriate value (see below).

WRITE_ONCE() is not about atomicity, it's about not allowing compilers
get too aggressive.

>
>
>> And also: if the other side changes ring size, what are we mapping then?
>> It's obsolete by now.
> If the grants are wrong, the mapping hypercalls will fail, the same way
> they do with any of the other PV frontends/backends today. That is not
> the problem we are trying to address with the barrier.
>
> The issue is here is that by runtime changes to map->ring->ring_order,
> the frontend could issue a denial of service by getting the backend into
> a busyloop. You can imagine that:
>
>   for (i = 0; i < map->ring->ring_order; i++) {
>
> might not work as the backend expects if map->ring->ring_order can
> change at any time.
>
> One could say that the code is already written this way:
>
>   for (i = 0; i < map->ring_order; i++) {
>
> So what's the problem? We have seen instances in the past of the
> compiler "optimizing" things in a way that actually the assembly did:
>
>   for (i = 0; i < map->ring->ring_order; i++) {
>
> This is why I put a barrier there, to avoid such compiler
> "optimizations". Does it make sense?

Right, I understand all this. I thought you meant that changing
ring_order was part of normal operation (i.e. somewhat expected) and I
couldn't see how that would work.

Thanks for taking time to write this down.

-boris

>
>
>>>>> +	if (map->ring_order > MAX_RING_ORDER) {
>>>>> +		ret = -EFAULT;
>>>>> +		goto out;
>>>>> +	}
>>>> If the barrier is indeed needed this check belongs before it.
>>> I don't think so, see above.
>>>
>>>
>>>>> +	ret = xenbus_map_ring_valloc(dev, map->ring->ref,
>>>>> +				     (1 << map->ring_order), &page);
>>>>> +	if (ret < 0) {
>>>>> +		sock_release(map->sock);
>>>>> +		xenbus_unmap_ring_vfree(dev, map->ring);
>>>>> +		kfree(map);
>>>>> +		goto out;
>>>>> +	}
>>>>> +	map->bytes = page;
>>>>>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-17  5:21         ` Juergen Gross
@ 2017-05-18 21:18           ` Stefano Stabellini
  2017-05-19 22:33             ` Stefano Stabellini
  2017-05-19 22:33             ` Stefano Stabellini
  2017-05-18 21:18           ` Stefano Stabellini
  1 sibling, 2 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-18 21:18 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	Stefano Stabellini

On Wed, 17 May 2017, Juergen Gross wrote:
> On 16/05/17 21:58, Stefano Stabellini wrote:
> > On Tue, 16 May 2017, Juergen Gross wrote:
> >> On 15/05/17 22:35, Stefano Stabellini wrote:
> >>> The pvcalls backend has one ioworker per cpu: the ioworkers are
> >>> implemented as a cpu bound workqueue, and will deal with the actual
> >>> socket and data ring reads/writes.
> >>>
> >>> ioworkers are global: we only have one set for all the frontends. They
> >>> process requests on their wqs list in order, once they are done with a
> >>> request, they'll remove it from the list. A spinlock is used for
> >>> protecting the list. Each ioworker is bound to a different cpu to
> >>> maximize throughput.
> >>>
> >>> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> >>> CC: boris.ostrovsky@oracle.com
> >>> CC: jgross@suse.com
> >>> ---
> >>>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 64 insertions(+)
> >>>
> >>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> >>> index 2dbf7d8..46a889a 100644
> >>> --- a/drivers/xen/pvcalls-back.c
> >>> +++ b/drivers/xen/pvcalls-back.c
> >>> @@ -25,6 +25,26 @@
> >>>  #include <xen/xenbus.h>
> >>>  #include <xen/interface/io/pvcalls.h>
> >>>  
> >>> +struct pvcalls_ioworker {
> >>> +	struct work_struct register_work;
> >>> +	atomic_t io;
> >>> +	struct list_head wqs;
> >>> +	spinlock_t lock;
> >>> +	int num;
> >>> +};
> >>> +
> >>> +struct pvcalls_back_global {
> >>> +	struct pvcalls_ioworker *ioworkers;
> >>> +	int nr_ioworkers;
> >>> +	struct workqueue_struct *wq;
> >>> +	struct list_head privs;
> >>> +	struct rw_semaphore privs_lock;
> >>> +} pvcalls_back_global;
> >>> +
> >>> +static void pvcalls_back_ioworker(struct work_struct *work)
> >>> +{
> >>> +}
> >>> +
> >>>  static int pvcalls_back_probe(struct xenbus_device *dev,
> >>>  			      const struct xenbus_device_id *id)
> >>>  {
> >>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
> >>>  	.uevent = pvcalls_back_uevent,
> >>>  	.otherend_changed = pvcalls_back_changed,
> >>>  };
> >>> +
> >>> +static int __init pvcalls_back_init(void)
> >>> +{
> >>> +	int ret, i, cpu;
> >>> +
> >>> +	if (!xen_domain())
> >>> +		return -ENODEV;
> >>> +
> >>> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> >>> +	if (ret < 0)
> >>> +		return ret;
> >>> +
> >>> +	init_rwsem(&pvcalls_back_global.privs_lock);
> >>> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> >>> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> >>> +	if (!pvcalls_back_global.wq)
> >>> +		goto error;
> >>> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> >>
> >> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
> >> we want 640 workers initialized when loading the backend module. I'd
> >> prefer one or a few workers per connected frontend.
> > 
> > I think we want to keep the ioworker allocation to be based on the
> > number of vcpus: we do not want more ioworkers than vcpus because it is
> > a waste of resources and leads to worse performance.  Also, given that
> > they do memcpy's, I also think it is a good idea to bind them to vcpus
> > (and pin vcpus to pcpus) to get best performance.
> 
> This will cause a lot of pain for the cpu offline case. Please don't try
> to work against the hypervisor scheduler by designing a backend based on
> a vcpu pin policy. This might result in best performance for your
> special workload, but generally it is a bad idea!

You are right. Of course, vcpu pinning is not a fundamental requirement
for this backend. I wrote about vcpu pinning only to help with the
explanation.

However, pvcalls is a memcpy based protocol and to perform memcpys
efficiently is very important to keep caches hot. The target is to hit
the same cacheline when reading and writing, which makes a huge
difference; it depends on processor and architecture but it is easily
20%. To get caching benefits, we need to do memcpys for the same socket
on the same vcpu (and on the same pcpu as well, that's why I mentioned
vcpu pinning, but we'll trust the Xen scheduler to do the right thing
when there is no contention).

This is why in this backend, regardless of the workqueue
design/allocation we use, I think we have to stick to two basic
principles:

- each socket is bound to one vcpu
- sockets are distributed evenly across vcpus


> > However, you have a point there: we need to handle systems with an
> > extremely large number of Dom0 vcpus. I suggest we introduce an
> > upper limit for the number of ioworkers. Something like:
> > 
> > #define MAX_IOWORKERS 64
> > nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())
> > 
> > MAX_IOWORKERS could be configurable via a command line option.
> 
> Later you are assigning each active socket to exactly one ioworker.
> Wouldn't it make more sense to allocate the ioworker when doing
> the connect? This would avoid the problem of having only a statistical
> distribution, possibly with all sockets on the same ioworker.
>
> Basically you are re-inventing the wheel by using an own workqueue
> implementation in each ioworker looping through all assigned sockets.

It might be possible to create an ioworker for each socket (instead of
an ioworker for each vcpu) if we wanted to, as long as we bind it to a
vcpu and distribute them evenly across vcpus.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-17  5:21         ` Juergen Gross
  2017-05-18 21:18           ` Stefano Stabellini
@ 2017-05-18 21:18           ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-18 21:18 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, boris.ostrovsky, Stefano Stabellini,
	linux-kernel, xen-devel

On Wed, 17 May 2017, Juergen Gross wrote:
> On 16/05/17 21:58, Stefano Stabellini wrote:
> > On Tue, 16 May 2017, Juergen Gross wrote:
> >> On 15/05/17 22:35, Stefano Stabellini wrote:
> >>> The pvcalls backend has one ioworker per cpu: the ioworkers are
> >>> implemented as a cpu bound workqueue, and will deal with the actual
> >>> socket and data ring reads/writes.
> >>>
> >>> ioworkers are global: we only have one set for all the frontends. They
> >>> process requests on their wqs list in order, once they are done with a
> >>> request, they'll remove it from the list. A spinlock is used for
> >>> protecting the list. Each ioworker is bound to a different cpu to
> >>> maximize throughput.
> >>>
> >>> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> >>> CC: boris.ostrovsky@oracle.com
> >>> CC: jgross@suse.com
> >>> ---
> >>>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 64 insertions(+)
> >>>
> >>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> >>> index 2dbf7d8..46a889a 100644
> >>> --- a/drivers/xen/pvcalls-back.c
> >>> +++ b/drivers/xen/pvcalls-back.c
> >>> @@ -25,6 +25,26 @@
> >>>  #include <xen/xenbus.h>
> >>>  #include <xen/interface/io/pvcalls.h>
> >>>  
> >>> +struct pvcalls_ioworker {
> >>> +	struct work_struct register_work;
> >>> +	atomic_t io;
> >>> +	struct list_head wqs;
> >>> +	spinlock_t lock;
> >>> +	int num;
> >>> +};
> >>> +
> >>> +struct pvcalls_back_global {
> >>> +	struct pvcalls_ioworker *ioworkers;
> >>> +	int nr_ioworkers;
> >>> +	struct workqueue_struct *wq;
> >>> +	struct list_head privs;
> >>> +	struct rw_semaphore privs_lock;
> >>> +} pvcalls_back_global;
> >>> +
> >>> +static void pvcalls_back_ioworker(struct work_struct *work)
> >>> +{
> >>> +}
> >>> +
> >>>  static int pvcalls_back_probe(struct xenbus_device *dev,
> >>>  			      const struct xenbus_device_id *id)
> >>>  {
> >>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
> >>>  	.uevent = pvcalls_back_uevent,
> >>>  	.otherend_changed = pvcalls_back_changed,
> >>>  };
> >>> +
> >>> +static int __init pvcalls_back_init(void)
> >>> +{
> >>> +	int ret, i, cpu;
> >>> +
> >>> +	if (!xen_domain())
> >>> +		return -ENODEV;
> >>> +
> >>> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> >>> +	if (ret < 0)
> >>> +		return ret;
> >>> +
> >>> +	init_rwsem(&pvcalls_back_global.privs_lock);
> >>> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> >>> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> >>> +	if (!pvcalls_back_global.wq)
> >>> +		goto error;
> >>> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> >>
> >> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
> >> we want 640 workers initialized when loading the backend module. I'd
> >> prefer one or a few workers per connected frontend.
> > 
> > I think we want to keep the ioworker allocation to be based on the
> > number of vcpus: we do not want more ioworkers than vcpus because it is
> > a waste of resources and leads to worse performance.  Also, given that
> > they do memcpy's, I also think it is a good idea to bind them to vcpus
> > (and pin vcpus to pcpus) to get best performance.
> 
> This will cause a lot of pain for the cpu offline case. Please don't try
> to work against the hypervisor scheduler by designing a backend based on
> a vcpu pin policy. This might result in best performance for your
> special workload, but generally it is a bad idea!

You are right. Of course, vcpu pinning is not a fundamental requirement
for this backend. I wrote about vcpu pinning only to help with the
explanation.

However, pvcalls is a memcpy based protocol and to perform memcpys
efficiently is very important to keep caches hot. The target is to hit
the same cacheline when reading and writing, which makes a huge
difference; it depends on processor and architecture but it is easily
20%. To get caching benefits, we need to do memcpys for the same socket
on the same vcpu (and on the same pcpu as well, that's why I mentioned
vcpu pinning, but we'll trust the Xen scheduler to do the right thing
when there is no contention).

This is why in this backend, regardless of the workqueue
design/allocation we use, I think we have to stick to two basic
principles:

- each socket is bound to one vcpu
- sockets are distributed evenly across vcpus


> > However, you have a point there: we need to handle systems with an
> > extremely large number of Dom0 vcpus. I suggest we introduce an
> > upper limit for the number of ioworkers. Something like:
> > 
> > #define MAX_IOWORKERS 64
> > nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())
> > 
> > MAX_IOWORKERS could be configurable via a command line option.
> 
> Later you are assigning each active socket to exactly one ioworker.
> Wouldn't it make more sense to allocate the ioworker when doing
> the connect? This would avoid the problem of having only a statistical
> distribution, possibly with all sockets on the same ioworker.
>
> Basically you are re-inventing the wheel by using an own workqueue
> implementation in each ioworker looping through all assigned sockets.

It might be possible to create an ioworker for each socket (instead of
an ioworker for each vcpu) if we wanted to, as long as we bind it to a
vcpu and distribute them evenly across vcpus.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-18 21:18           ` Stefano Stabellini
  2017-05-19 22:33             ` Stefano Stabellini
@ 2017-05-19 22:33             ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-19 22:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, xen-devel, linux-kernel, boris.ostrovsky,
	Stefano Stabellini

On Thu, 18 May 2017, Stefano Stabellini wrote:
> On Wed, 17 May 2017, Juergen Gross wrote:
> > On 16/05/17 21:58, Stefano Stabellini wrote:
> > > On Tue, 16 May 2017, Juergen Gross wrote:
> > >> On 15/05/17 22:35, Stefano Stabellini wrote:
> > >>> The pvcalls backend has one ioworker per cpu: the ioworkers are
> > >>> implemented as a cpu bound workqueue, and will deal with the actual
> > >>> socket and data ring reads/writes.
> > >>>
> > >>> ioworkers are global: we only have one set for all the frontends. They
> > >>> process requests on their wqs list in order, once they are done with a
> > >>> request, they'll remove it from the list. A spinlock is used for
> > >>> protecting the list. Each ioworker is bound to a different cpu to
> > >>> maximize throughput.
> > >>>
> > >>> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > >>> CC: boris.ostrovsky@oracle.com
> > >>> CC: jgross@suse.com
> > >>> ---
> > >>>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> > >>>  1 file changed, 64 insertions(+)
> > >>>
> > >>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > >>> index 2dbf7d8..46a889a 100644
> > >>> --- a/drivers/xen/pvcalls-back.c
> > >>> +++ b/drivers/xen/pvcalls-back.c
> > >>> @@ -25,6 +25,26 @@
> > >>>  #include <xen/xenbus.h>
> > >>>  #include <xen/interface/io/pvcalls.h>
> > >>>  
> > >>> +struct pvcalls_ioworker {
> > >>> +	struct work_struct register_work;
> > >>> +	atomic_t io;
> > >>> +	struct list_head wqs;
> > >>> +	spinlock_t lock;
> > >>> +	int num;
> > >>> +};
> > >>> +
> > >>> +struct pvcalls_back_global {
> > >>> +	struct pvcalls_ioworker *ioworkers;
> > >>> +	int nr_ioworkers;
> > >>> +	struct workqueue_struct *wq;
> > >>> +	struct list_head privs;
> > >>> +	struct rw_semaphore privs_lock;
> > >>> +} pvcalls_back_global;
> > >>> +
> > >>> +static void pvcalls_back_ioworker(struct work_struct *work)
> > >>> +{
> > >>> +}
> > >>> +
> > >>>  static int pvcalls_back_probe(struct xenbus_device *dev,
> > >>>  			      const struct xenbus_device_id *id)
> > >>>  {
> > >>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
> > >>>  	.uevent = pvcalls_back_uevent,
> > >>>  	.otherend_changed = pvcalls_back_changed,
> > >>>  };
> > >>> +
> > >>> +static int __init pvcalls_back_init(void)
> > >>> +{
> > >>> +	int ret, i, cpu;
> > >>> +
> > >>> +	if (!xen_domain())
> > >>> +		return -ENODEV;
> > >>> +
> > >>> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> > >>> +	if (ret < 0)
> > >>> +		return ret;
> > >>> +
> > >>> +	init_rwsem(&pvcalls_back_global.privs_lock);
> > >>> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> > >>> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> > >>> +	if (!pvcalls_back_global.wq)
> > >>> +		goto error;
> > >>> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> > >>
> > >> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
> > >> we want 640 workers initialized when loading the backend module. I'd
> > >> prefer one or a few workers per connected frontend.
> > > 
> > > I think we want to keep the ioworker allocation to be based on the
> > > number of vcpus: we do not want more ioworkers than vcpus because it is
> > > a waste of resources and leads to worse performance.  Also, given that
> > > they do memcpy's, I also think it is a good idea to bind them to vcpus
> > > (and pin vcpus to pcpus) to get best performance.
> > 
> > This will cause a lot of pain for the cpu offline case. Please don't try
> > to work against the hypervisor scheduler by designing a backend based on
> > a vcpu pin policy. This might result in best performance for your
> > special workload, but generally it is a bad idea!
> 
> You are right. Of course, vcpu pinning is not a fundamental requirement
> for this backend. I wrote about vcpu pinning only to help with the
> explanation.
> 
> However, pvcalls is a memcpy based protocol and to perform memcpys
> efficiently is very important to keep caches hot. The target is to hit
> the same cacheline when reading and writing, which makes a huge
> difference; it depends on processor and architecture but it is easily
> 20%. To get caching benefits, we need to do memcpys for the same socket
> on the same vcpu (and on the same pcpu as well, that's why I mentioned
> vcpu pinning, but we'll trust the Xen scheduler to do the right thing
> when there is no contention).
> 
> This is why in this backend, regardless of the workqueue
> design/allocation we use, I think we have to stick to two basic
> principles:
> 
> - each socket is bound to one vcpu
> - sockets are distributed evenly across vcpus

[...]

> It might be possible to create an ioworker for each socket (instead of
> an ioworker for each vcpu) if we wanted to, as long as we bind it to a
> vcpu and distribute them evenly across vcpus.

I don't have access anymore to the large machine I used for the
benchmarks a few months back, but even on my current small testbox
I can see a (small) performance penalty when I don't bind ioworkers to
specific vcpus.

However, using an ioworker per socket, rather than per vcpu, makes the
code much smaller and nicer! Also it makes it much easier to change the
policy between binding, or not binding, ioworkers to specific vcpu.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend
  2017-05-18 21:18           ` Stefano Stabellini
@ 2017-05-19 22:33             ` Stefano Stabellini
  2017-05-19 22:33             ` Stefano Stabellini
  1 sibling, 0 replies; 81+ messages in thread
From: Stefano Stabellini @ 2017-05-19 22:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Stefano Stabellini, boris.ostrovsky, linux-kernel,
	xen-devel

On Thu, 18 May 2017, Stefano Stabellini wrote:
> On Wed, 17 May 2017, Juergen Gross wrote:
> > On 16/05/17 21:58, Stefano Stabellini wrote:
> > > On Tue, 16 May 2017, Juergen Gross wrote:
> > >> On 15/05/17 22:35, Stefano Stabellini wrote:
> > >>> The pvcalls backend has one ioworker per cpu: the ioworkers are
> > >>> implemented as a cpu bound workqueue, and will deal with the actual
> > >>> socket and data ring reads/writes.
> > >>>
> > >>> ioworkers are global: we only have one set for all the frontends. They
> > >>> process requests on their wqs list in order, once they are done with a
> > >>> request, they'll remove it from the list. A spinlock is used for
> > >>> protecting the list. Each ioworker is bound to a different cpu to
> > >>> maximize throughput.
> > >>>
> > >>> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > >>> CC: boris.ostrovsky@oracle.com
> > >>> CC: jgross@suse.com
> > >>> ---
> > >>>  drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> > >>>  1 file changed, 64 insertions(+)
> > >>>
> > >>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > >>> index 2dbf7d8..46a889a 100644
> > >>> --- a/drivers/xen/pvcalls-back.c
> > >>> +++ b/drivers/xen/pvcalls-back.c
> > >>> @@ -25,6 +25,26 @@
> > >>>  #include <xen/xenbus.h>
> > >>>  #include <xen/interface/io/pvcalls.h>
> > >>>  
> > >>> +struct pvcalls_ioworker {
> > >>> +	struct work_struct register_work;
> > >>> +	atomic_t io;
> > >>> +	struct list_head wqs;
> > >>> +	spinlock_t lock;
> > >>> +	int num;
> > >>> +};
> > >>> +
> > >>> +struct pvcalls_back_global {
> > >>> +	struct pvcalls_ioworker *ioworkers;
> > >>> +	int nr_ioworkers;
> > >>> +	struct workqueue_struct *wq;
> > >>> +	struct list_head privs;
> > >>> +	struct rw_semaphore privs_lock;
> > >>> +} pvcalls_back_global;
> > >>> +
> > >>> +static void pvcalls_back_ioworker(struct work_struct *work)
> > >>> +{
> > >>> +}
> > >>> +
> > >>>  static int pvcalls_back_probe(struct xenbus_device *dev,
> > >>>  			      const struct xenbus_device_id *id)
> > >>>  {
> > >>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
> > >>>  	.uevent = pvcalls_back_uevent,
> > >>>  	.otherend_changed = pvcalls_back_changed,
> > >>>  };
> > >>> +
> > >>> +static int __init pvcalls_back_init(void)
> > >>> +{
> > >>> +	int ret, i, cpu;
> > >>> +
> > >>> +	if (!xen_domain())
> > >>> +		return -ENODEV;
> > >>> +
> > >>> +	ret = xenbus_register_backend(&pvcalls_back_driver);
> > >>> +	if (ret < 0)
> > >>> +		return ret;
> > >>> +
> > >>> +	init_rwsem(&pvcalls_back_global.privs_lock);
> > >>> +	INIT_LIST_HEAD(&pvcalls_back_global.privs);
> > >>> +	pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> > >>> +	if (!pvcalls_back_global.wq)
> > >>> +		goto error;
> > >>> +	pvcalls_back_global.nr_ioworkers = num_online_cpus();
> > >>
> > >> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
> > >> we want 640 workers initialized when loading the backend module. I'd
> > >> prefer one or a few workers per connected frontend.
> > > 
> > > I think we want to keep the ioworker allocation to be based on the
> > > number of vcpus: we do not want more ioworkers than vcpus because it is
> > > a waste of resources and leads to worse performance.  Also, given that
> > > they do memcpy's, I also think it is a good idea to bind them to vcpus
> > > (and pin vcpus to pcpus) to get best performance.
> > 
> > This will cause a lot of pain for the cpu offline case. Please don't try
> > to work against the hypervisor scheduler by designing a backend based on
> > a vcpu pin policy. This might result in best performance for your
> > special workload, but generally it is a bad idea!
> 
> You are right. Of course, vcpu pinning is not a fundamental requirement
> for this backend. I wrote about vcpu pinning only to help with the
> explanation.
> 
> However, pvcalls is a memcpy based protocol and to perform memcpys
> efficiently is very important to keep caches hot. The target is to hit
> the same cacheline when reading and writing, which makes a huge
> difference; it depends on processor and architecture but it is easily
> 20%. To get caching benefits, we need to do memcpys for the same socket
> on the same vcpu (and on the same pcpu as well, that's why I mentioned
> vcpu pinning, but we'll trust the Xen scheduler to do the right thing
> when there is no contention).
> 
> This is why in this backend, regardless of the workqueue
> design/allocation we use, I think we have to stick to two basic
> principles:
> 
> - each socket is bound to one vcpu
> - sockets are distributed evenly across vcpus

[...]

> It might be possible to create an ioworker for each socket (instead of
> an ioworker for each vcpu) if we wanted to, as long as we bind it to a
> vcpu and distribute them evenly across vcpus.

I don't have access anymore to the large machine I used for the
benchmarks a few months back, but even on my current small testbox
I can see a (small) performance penalty when I don't bind ioworkers to
specific vcpus.

However, using an ioworker per socket, rather than per vcpu, makes the
code much smaller and nicer! Also it makes it much easier to change the
policy between binding, or not binding, ioworkers to specific vcpu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2017-05-19 22:33 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 20:35 [PATCH 00/18] introduce the Xen PV Calls backend Stefano Stabellini
2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini
2017-05-15 20:35   ` [PATCH 02/18] xen/pvcalls: introduce the pvcalls xenbus backend Stefano Stabellini
2017-05-15 20:35   ` Stefano Stabellini
2017-05-15 20:35   ` [PATCH 03/18] xen/pvcalls: initialize the module and register the " Stefano Stabellini
2017-05-15 20:35     ` Stefano Stabellini
2017-05-16  1:28     ` Boris Ostrovsky
2017-05-16 20:05       ` Stefano Stabellini
2017-05-16 20:05       ` Stefano Stabellini
2017-05-16 20:22         ` Stefano Stabellini
2017-05-16 20:22         ` Stefano Stabellini
2017-05-16  1:28     ` Boris Ostrovsky
2017-05-16  6:40     ` Juergen Gross
2017-05-16 19:58       ` Stefano Stabellini
2017-05-16 19:58       ` Stefano Stabellini
2017-05-17  5:21         ` Juergen Gross
2017-05-17  5:21         ` Juergen Gross
2017-05-18 21:18           ` Stefano Stabellini
2017-05-19 22:33             ` Stefano Stabellini
2017-05-19 22:33             ` Stefano Stabellini
2017-05-18 21:18           ` Stefano Stabellini
2017-05-16  6:40     ` Juergen Gross
2017-05-15 20:35   ` [PATCH 04/18] xen/pvcalls: xenbus state handling Stefano Stabellini
2017-05-15 20:35     ` Stefano Stabellini
2017-05-16  1:34     ` Boris Ostrovsky
2017-05-16  1:34       ` Boris Ostrovsky
2017-05-16 20:11       ` Stefano Stabellini
2017-05-16 20:11       ` Stefano Stabellini
2017-05-15 20:35   ` [PATCH 05/18] xen/pvcalls: connect to a frontend Stefano Stabellini
2017-05-15 20:35     ` Stefano Stabellini
2017-05-16  1:52     ` Boris Ostrovsky
2017-05-16  1:52     ` Boris Ostrovsky
2017-05-16 20:23       ` Stefano Stabellini
2017-05-16 20:23       ` Stefano Stabellini
2017-05-16 20:38         ` Stefano Stabellini
2017-05-16 20:38         ` Stefano Stabellini
2017-05-15 20:35   ` [PATCH 06/18] xen/pvcalls: handle commands from the frontend Stefano Stabellini
2017-05-15 20:35     ` Stefano Stabellini
2017-05-16  2:06     ` Boris Ostrovsky
2017-05-16 20:57       ` Stefano Stabellini
2017-05-16 20:57       ` Stefano Stabellini
2017-05-16  2:06     ` Boris Ostrovsky
2017-05-15 20:35   ` [PATCH 07/18] xen/pvcalls: implement socket command Stefano Stabellini
2017-05-15 20:35     ` Stefano Stabellini
2017-05-16  2:12     ` Boris Ostrovsky
2017-05-16 20:45       ` Stefano Stabellini
2017-05-16 20:45       ` Stefano Stabellini
2017-05-16  2:12     ` Boris Ostrovsky
2017-05-15 20:36   ` [PATCH 08/18] xen/pvcalls: implement connect command Stefano Stabellini
2017-05-16  2:36     ` Boris Ostrovsky
2017-05-16 21:02       ` Stefano Stabellini
2017-05-16 21:02       ` Stefano Stabellini
2017-05-16 21:56         ` Boris Ostrovsky
2017-05-16 21:56         ` Boris Ostrovsky
2017-05-18 19:10           ` Stefano Stabellini
2017-05-18 20:19             ` Boris Ostrovsky
2017-05-18 20:19             ` Boris Ostrovsky
2017-05-18 19:10           ` Stefano Stabellini
2017-05-16  2:36     ` Boris Ostrovsky
2017-05-15 20:36   ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 09/18] xen/pvcalls: implement bind command Stefano Stabellini
2017-05-15 20:36     ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 10/18] xen/pvcalls: implement listen command Stefano Stabellini
2017-05-15 20:36   ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 11/18] xen/pvcalls: implement accept command Stefano Stabellini
2017-05-15 20:36     ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 12/18] xen/pvcalls: implement poll command Stefano Stabellini
2017-05-15 20:36     ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 13/18] xen/pvcalls: implement release command Stefano Stabellini
2017-05-15 20:36     ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 14/18] xen/pvcalls: disconnect and module_exit Stefano Stabellini
2017-05-15 20:36   ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 15/18] xen/pvcalls: introduce the ioworker Stefano Stabellini
2017-05-15 20:36   ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 16/18] xen/pvcalls: implement read Stefano Stabellini
2017-05-15 20:36     ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 17/18] xen/pvcalls: implement write Stefano Stabellini
2017-05-15 20:36   ` Stefano Stabellini
2017-05-15 20:36   ` [PATCH 18/18] xen: introduce a Kconfig option to enable the pvcalls backend Stefano Stabellini
2017-05-15 20:36     ` Stefano Stabellini
2017-05-15 20:35 ` [PATCH 01/18] xen: introduce the pvcalls interface header Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.