[PATCH v5 01/13] xen/pvcalls: introduce the pvcalls xenbus frontend

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 01/13] xen/pvcalls: introduce the pvcalls xenbus frontend
  2017-10-07  0:30 [PATCH v5 00/13] introduce the Xen PV Calls frontend:wq Stefano Stabellini
@ 2017-10-07  0:30   ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Introduce a xenbus frontend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.c

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
new file mode 100644
index 0000000..a8d38c2
--- /dev/null
+++ b/drivers/xen/pvcalls-front.c
@@ -0,0 +1,61 @@
+/*
+ * (c) 2017 Stefano Stabellini <stefano@aporeto.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+
+#include <xen/events.h>
+#include <xen/grant_table.h>
+#include <xen/xen.h>
+#include <xen/xenbus.h>
+#include <xen/interface/io/pvcalls.h>
+
+static const struct xenbus_device_id pvcalls_front_ids[] = {
+	{ "pvcalls" },
+	{ "" }
+};
+
+static int pvcalls_front_remove(struct xenbus_device *dev)
+{
+	return 0;
+}
+
+static int pvcalls_front_probe(struct xenbus_device *dev,
+			  const struct xenbus_device_id *id)
+{
+	return 0;
+}
+
+static void pvcalls_front_changed(struct xenbus_device *dev,
+			    enum xenbus_state backend_state)
+{
+}
+
+static struct xenbus_driver pvcalls_front_driver = {
+	.ids = pvcalls_front_ids,
+	.probe = pvcalls_front_probe,
+	.remove = pvcalls_front_remove,
+	.otherend_changed = pvcalls_front_changed,
+};
+
+static int __init pvcalls_frontend_init(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	pr_info("Initialising Xen pvcalls frontend driver\n");
+
+	return xenbus_register_frontend(&pvcalls_front_driver);
+}
+
+module_init(pvcalls_frontend_init);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 01/13] xen/pvcalls: introduce the pvcalls xenbus frontend
@ 2017-10-07  0:30   ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Introduce a xenbus frontend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.c

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
new file mode 100644
index 0000000..a8d38c2
--- /dev/null
+++ b/drivers/xen/pvcalls-front.c
@@ -0,0 +1,61 @@
+/*
+ * (c) 2017 Stefano Stabellini <stefano@aporeto.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+
+#include <xen/events.h>
+#include <xen/grant_table.h>
+#include <xen/xen.h>
+#include <xen/xenbus.h>
+#include <xen/interface/io/pvcalls.h>
+
+static const struct xenbus_device_id pvcalls_front_ids[] = {
+	{ "pvcalls" },
+	{ "" }
+};
+
+static int pvcalls_front_remove(struct xenbus_device *dev)
+{
+	return 0;
+}
+
+static int pvcalls_front_probe(struct xenbus_device *dev,
+			  const struct xenbus_device_id *id)
+{
+	return 0;
+}
+
+static void pvcalls_front_changed(struct xenbus_device *dev,
+			    enum xenbus_state backend_state)
+{
+}
+
+static struct xenbus_driver pvcalls_front_driver = {
+	.ids = pvcalls_front_ids,
+	.probe = pvcalls_front_probe,
+	.remove = pvcalls_front_remove,
+	.otherend_changed = pvcalls_front_changed,
+};
+
+static int __init pvcalls_frontend_init(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	pr_info("Initialising Xen pvcalls frontend driver\n");
+
+	return xenbus_register_frontend(&pvcalls_front_driver);
+}
+
+module_init(pvcalls_frontend_init);
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 02/13] xen/pvcalls: implement frontend disconnect
  2017-10-07  0:30   ` Stefano Stabellini
  (?)
  (?)
@ 2017-10-07  0:30   ` Stefano Stabellini
  2017-10-17 16:01     ` Boris Ostrovsky
  2017-10-17 16:01     ` Boris Ostrovsky
  -1 siblings, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Introduce a data structure named pvcalls_bedata. It contains pointers to
the command ring, the event channel, a list of active sockets and a list
of passive sockets. Lists accesses are protected by a spin_lock.

Introduce a waitqueue to allow waiting for a response on commands sent
to the backend.

Introduce an array of struct xen_pvcalls_response to store commands
responses.

pvcalls_refcount is used to keep count of the outstanding pvcalls users.
Only remove connections once the refcount is zero.

Implement pvcalls frontend removal function. Go through the list of
active and passive sockets and free them all, one at a time.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index a8d38c2..d8b7a04 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -20,6 +20,46 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+#define PVCALLS_INVALID_ID UINT_MAX
+#define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
+
+struct pvcalls_bedata {
+	struct xen_pvcalls_front_ring ring;
+	grant_ref_t ref;
+	int irq;
+
+	struct list_head socket_mappings;
+	struct list_head socketpass_mappings;
+	spinlock_t socket_lock;
+
+	wait_queue_head_t inflight_req;
+	struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];
+};
+/* Only one front/back connection supported. */
+static struct xenbus_device *pvcalls_front_dev;
+static atomic_t pvcalls_refcount;
+
+/* first increment refcount, then proceed */
+#define pvcalls_enter() {               \
+	atomic_inc(&pvcalls_refcount);      \
+}
+
+/* first complete other operations, then decrement refcount */
+#define pvcalls_exit() {                \
+	atomic_dec(&pvcalls_refcount);      \
+}
+
+static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
+{
+	return IRQ_HANDLED;
+}
+
+static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
+				   struct sock_mapping *map)
+{
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
@@ -27,6 +67,33 @@
 
 static int pvcalls_front_remove(struct xenbus_device *dev)
 {
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL, *n;
+
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+	dev_set_drvdata(&dev->dev, NULL);
+	pvcalls_front_dev = NULL;
+	if (bedata->irq >= 0)
+		unbind_from_irqhandler(bedata->irq, dev);
+
+	smp_mb();
+	while (atomic_read(&pvcalls_refcount) > 0)
+		cpu_relax();
+	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
+		pvcalls_front_free_map(bedata, map);
+		kfree(map);
+	}
+	list_for_each_entry_safe(map, n, &bedata->socketpass_mappings, list) {
+		spin_lock(&bedata->socket_lock);
+		list_del_init(&map->list);
+		spin_unlock(&bedata->socket_lock);
+		kfree(map);
+	}
+	if (bedata->ref >= 0)
+		gnttab_end_foreign_access(bedata->ref, 0, 0);
+	kfree(bedata->ring.sring);
+	kfree(bedata);
+	xenbus_switch_state(dev, XenbusStateClosed);
 	return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 02/13] xen/pvcalls: implement frontend disconnect
  2017-10-07  0:30   ` Stefano Stabellini
  (?)
@ 2017-10-07  0:30   ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Introduce a data structure named pvcalls_bedata. It contains pointers to
the command ring, the event channel, a list of active sockets and a list
of passive sockets. Lists accesses are protected by a spin_lock.

Introduce a waitqueue to allow waiting for a response on commands sent
to the backend.

Introduce an array of struct xen_pvcalls_response to store commands
responses.

pvcalls_refcount is used to keep count of the outstanding pvcalls users.
Only remove connections once the refcount is zero.

Implement pvcalls frontend removal function. Go through the list of
active and passive sockets and free them all, one at a time.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index a8d38c2..d8b7a04 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -20,6 +20,46 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+#define PVCALLS_INVALID_ID UINT_MAX
+#define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
+
+struct pvcalls_bedata {
+	struct xen_pvcalls_front_ring ring;
+	grant_ref_t ref;
+	int irq;
+
+	struct list_head socket_mappings;
+	struct list_head socketpass_mappings;
+	spinlock_t socket_lock;
+
+	wait_queue_head_t inflight_req;
+	struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];
+};
+/* Only one front/back connection supported. */
+static struct xenbus_device *pvcalls_front_dev;
+static atomic_t pvcalls_refcount;
+
+/* first increment refcount, then proceed */
+#define pvcalls_enter() {               \
+	atomic_inc(&pvcalls_refcount);      \
+}
+
+/* first complete other operations, then decrement refcount */
+#define pvcalls_exit() {                \
+	atomic_dec(&pvcalls_refcount);      \
+}
+
+static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
+{
+	return IRQ_HANDLED;
+}
+
+static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
+				   struct sock_mapping *map)
+{
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
@@ -27,6 +67,33 @@
 
 static int pvcalls_front_remove(struct xenbus_device *dev)
 {
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL, *n;
+
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+	dev_set_drvdata(&dev->dev, NULL);
+	pvcalls_front_dev = NULL;
+	if (bedata->irq >= 0)
+		unbind_from_irqhandler(bedata->irq, dev);
+
+	smp_mb();
+	while (atomic_read(&pvcalls_refcount) > 0)
+		cpu_relax();
+	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
+		pvcalls_front_free_map(bedata, map);
+		kfree(map);
+	}
+	list_for_each_entry_safe(map, n, &bedata->socketpass_mappings, list) {
+		spin_lock(&bedata->socket_lock);
+		list_del_init(&map->list);
+		spin_unlock(&bedata->socket_lock);
+		kfree(map);
+	}
+	if (bedata->ref >= 0)
+		gnttab_end_foreign_access(bedata->ref, 0, 0);
+	kfree(bedata->ring.sring);
+	kfree(bedata);
+	xenbus_switch_state(dev, XenbusStateClosed);
 	return 0;
 }
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 00/13] introduce the Xen PV Calls frontend:wq
@ 2017-10-07  0:30 Stefano Stabellini
  2017-10-07  0:30   ` Stefano Stabellini
  0 siblings, 1 reply; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel; +Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky

Hi all,

this series introduces the frontend for the newly introduced PV Calls
procotol.

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

For more information about PV Calls, please read:

https://xenbits.xen.org/docs/unstable/misc/pvcalls.html

This patch series only implements the frontend driver. It doesn't
attempt to redirect POSIX calls to it. The functions exported in
pvcalls-front.h are meant to be used for that. A separate patch series
will be sent to use them and hook them into the system.


Changes in v5:
- add a comment about single frontend-backend connection
- remove atomic_inc/dec in pvcalls_enter/exit
- add reviewed-by
- redefine pvcalls_enter/exit as pvcalls_enter/exit()
- move initial error checks in functions as early as possible (before
  locks, refcounts, etc)
- remove WRITE_ONCE when used after a barrier
- fix code style
- move init_waitqueue_head(&map->passive.inflight_accept_req) to patch #8
- clear PVCALLS_FLAG_ACCEPT_INFLIGHT on errors in accept
- clear INFLIGHT then set RET in pvcalls_front_event_handler, see
  alpine.DEB.2.10.1710061515060.3073@sstabellini-ThinkPad-X260
- add smp_rmb() after reading req_id before reading ret in all functions
- remove smp_mb() after reading ret and before clearing req_id
- create an empty pvcalls_front_free_map in patch #2


Stefano Stabellini (13):
      xen/pvcalls: introduce the pvcalls xenbus frontend
      xen/pvcalls: implement frontend disconnect
      xen/pvcalls: connect to the backend
      xen/pvcalls: implement socket command and handle events
      xen/pvcalls: implement connect command
      xen/pvcalls: implement bind command
      xen/pvcalls: implement listen command
      xen/pvcalls: implement accept command
      xen/pvcalls: implement sendmsg
      xen/pvcalls: implement recvmsg
      xen/pvcalls: implement poll command
      xen/pvcalls: implement release command
      xen: introduce a Kconfig option to enable the pvcalls frontend

 drivers/xen/Kconfig         |    9 +
 drivers/xen/Makefile        |    1 +
 drivers/xen/pvcalls-front.c | 1275 +++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   28 +
 4 files changed, 1313 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.c
 create mode 100644 drivers/xen/pvcalls-front.h

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 03/13] xen/pvcalls: connect to the backend
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Implement the probe function for the pvcalls frontend. Read the
supported versions, max-page-order and function-calls nodes from
xenstore.

Only one frontend<->backend connection is supported at any given time
for a guest. Store the active frontend device to a static pointer.

Introduce a stub functions for the event handler.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index d8b7a04..490c4c1 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -100,12 +100,145 @@ static int pvcalls_front_remove(struct xenbus_device *dev)
 static int pvcalls_front_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
+	int ret = -ENOMEM, evtchn, i;
+	unsigned int max_page_order, function_calls, len;
+	char *versions;
+	grant_ref_t gref_head = 0;
+	struct xenbus_transaction xbt;
+	struct pvcalls_bedata *bedata = NULL;
+	struct xen_pvcalls_sring *sring;
+
+	if (pvcalls_front_dev != NULL) {
+		dev_err(&dev->dev, "only one PV Calls connection supported\n");
+		return -EINVAL;
+	}
+
+	versions = xenbus_read(XBT_NIL, dev->otherend, "versions", &len);
+	if (!len)
+		return -EINVAL;
+	if (strcmp(versions, "1")) {
+		kfree(versions);
+		return -EINVAL;
+	}
+	kfree(versions);
+	max_page_order = xenbus_read_unsigned(dev->otherend,
+					      "max-page-order", 0);
+	if (max_page_order < PVCALLS_RING_ORDER)
+		return -ENODEV;
+	function_calls = xenbus_read_unsigned(dev->otherend,
+					      "function-calls", 0);
+	/* See XENBUS_FUNCTIONS_CALLS in pvcalls.h */
+	if (function_calls != 1)
+		return -ENODEV;
+	pr_info("%s max-page-order is %u\n", __func__, max_page_order);
+
+	bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL);
+	if (!bedata)
+		return -ENOMEM;
+
+	dev_set_drvdata(&dev->dev, bedata);
+	pvcalls_front_dev = dev;
+	init_waitqueue_head(&bedata->inflight_req);
+	INIT_LIST_HEAD(&bedata->socket_mappings);
+	INIT_LIST_HEAD(&bedata->socketpass_mappings);
+	spin_lock_init(&bedata->socket_lock);
+	bedata->irq = -1;
+	bedata->ref = -1;
+
+	for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++)
+		bedata->rsp[i].req_id = PVCALLS_INVALID_ID;
+
+	sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL |
+							     __GFP_ZERO);
+	if (!sring)
+		goto error;
+	SHARED_RING_INIT(sring);
+	FRONT_RING_INIT(&bedata->ring, sring, XEN_PAGE_SIZE);
+
+	ret = xenbus_alloc_evtchn(dev, &evtchn);
+	if (ret)
+		goto error;
+
+	bedata->irq = bind_evtchn_to_irqhandler(evtchn,
+						pvcalls_front_event_handler,
+						0, "pvcalls-frontend", dev);
+	if (bedata->irq < 0) {
+		ret = bedata->irq;
+		goto error;
+	}
+
+	ret = gnttab_alloc_grant_references(1, &gref_head);
+	if (ret < 0)
+		goto error;
+	bedata->ref = gnttab_claim_grant_reference(&gref_head);
+	if (bedata->ref < 0) {
+		ret = bedata->ref;
+		goto error;
+	}
+	gnttab_grant_foreign_access_ref(bedata->ref, dev->otherend_id,
+					virt_to_gfn((void *)sring), 0);
+
+ again:
+	ret = xenbus_transaction_start(&xbt);
+	if (ret) {
+		xenbus_dev_fatal(dev, ret, "starting transaction");
+		goto error;
+	}
+	ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
+	if (ret)
+		goto error_xenbus;
+	ret = xenbus_printf(xbt, dev->nodename, "ring-ref", "%d", bedata->ref);
+	if (ret)
+		goto error_xenbus;
+	ret = xenbus_printf(xbt, dev->nodename, "port", "%u",
+			    evtchn);
+	if (ret)
+		goto error_xenbus;
+	ret = xenbus_transaction_end(xbt, 0);
+	if (ret) {
+		if (ret == -EAGAIN)
+			goto again;
+		xenbus_dev_fatal(dev, ret, "completing transaction");
+		goto error;
+	}
+	xenbus_switch_state(dev, XenbusStateInitialised);
+
 	return 0;
+
+ error_xenbus:
+	xenbus_transaction_end(xbt, 1);
+	xenbus_dev_fatal(dev, ret, "writing xenstore");
+ error:
+	pvcalls_front_remove(dev);
+	return ret;
 }
 
 static void pvcalls_front_changed(struct xenbus_device *dev,
 			    enum xenbus_state backend_state)
 {
+	switch (backend_state) {
+	case XenbusStateReconfiguring:
+	case XenbusStateReconfigured:
+	case XenbusStateInitialising:
+	case XenbusStateInitialised:
+	case XenbusStateUnknown:
+		break;
+
+	case XenbusStateInitWait:
+		break;
+
+	case XenbusStateConnected:
+		xenbus_switch_state(dev, XenbusStateConnected);
+		break;
+
+	case XenbusStateClosed:
+		if (dev->state == XenbusStateClosed)
+			break;
+		/* Missed the backend's CLOSING state -- fallthrough */
+	case XenbusStateClosing:
+		xenbus_frontend_closed(dev);
+		break;
+	}
 }
 
 static struct xenbus_driver pvcalls_front_driver = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 03/13] xen/pvcalls: connect to the backend
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Implement the probe function for the pvcalls frontend. Read the
supported versions, max-page-order and function-calls nodes from
xenstore.

Only one frontend<->backend connection is supported at any given time
for a guest. Store the active frontend device to a static pointer.

Introduce a stub functions for the event handler.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index d8b7a04..490c4c1 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -100,12 +100,145 @@ static int pvcalls_front_remove(struct xenbus_device *dev)
 static int pvcalls_front_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
+	int ret = -ENOMEM, evtchn, i;
+	unsigned int max_page_order, function_calls, len;
+	char *versions;
+	grant_ref_t gref_head = 0;
+	struct xenbus_transaction xbt;
+	struct pvcalls_bedata *bedata = NULL;
+	struct xen_pvcalls_sring *sring;
+
+	if (pvcalls_front_dev != NULL) {
+		dev_err(&dev->dev, "only one PV Calls connection supported\n");
+		return -EINVAL;
+	}
+
+	versions = xenbus_read(XBT_NIL, dev->otherend, "versions", &len);
+	if (!len)
+		return -EINVAL;
+	if (strcmp(versions, "1")) {
+		kfree(versions);
+		return -EINVAL;
+	}
+	kfree(versions);
+	max_page_order = xenbus_read_unsigned(dev->otherend,
+					      "max-page-order", 0);
+	if (max_page_order < PVCALLS_RING_ORDER)
+		return -ENODEV;
+	function_calls = xenbus_read_unsigned(dev->otherend,
+					      "function-calls", 0);
+	/* See XENBUS_FUNCTIONS_CALLS in pvcalls.h */
+	if (function_calls != 1)
+		return -ENODEV;
+	pr_info("%s max-page-order is %u\n", __func__, max_page_order);
+
+	bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL);
+	if (!bedata)
+		return -ENOMEM;
+
+	dev_set_drvdata(&dev->dev, bedata);
+	pvcalls_front_dev = dev;
+	init_waitqueue_head(&bedata->inflight_req);
+	INIT_LIST_HEAD(&bedata->socket_mappings);
+	INIT_LIST_HEAD(&bedata->socketpass_mappings);
+	spin_lock_init(&bedata->socket_lock);
+	bedata->irq = -1;
+	bedata->ref = -1;
+
+	for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++)
+		bedata->rsp[i].req_id = PVCALLS_INVALID_ID;
+
+	sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL |
+							     __GFP_ZERO);
+	if (!sring)
+		goto error;
+	SHARED_RING_INIT(sring);
+	FRONT_RING_INIT(&bedata->ring, sring, XEN_PAGE_SIZE);
+
+	ret = xenbus_alloc_evtchn(dev, &evtchn);
+	if (ret)
+		goto error;
+
+	bedata->irq = bind_evtchn_to_irqhandler(evtchn,
+						pvcalls_front_event_handler,
+						0, "pvcalls-frontend", dev);
+	if (bedata->irq < 0) {
+		ret = bedata->irq;
+		goto error;
+	}
+
+	ret = gnttab_alloc_grant_references(1, &gref_head);
+	if (ret < 0)
+		goto error;
+	bedata->ref = gnttab_claim_grant_reference(&gref_head);
+	if (bedata->ref < 0) {
+		ret = bedata->ref;
+		goto error;
+	}
+	gnttab_grant_foreign_access_ref(bedata->ref, dev->otherend_id,
+					virt_to_gfn((void *)sring), 0);
+
+ again:
+	ret = xenbus_transaction_start(&xbt);
+	if (ret) {
+		xenbus_dev_fatal(dev, ret, "starting transaction");
+		goto error;
+	}
+	ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
+	if (ret)
+		goto error_xenbus;
+	ret = xenbus_printf(xbt, dev->nodename, "ring-ref", "%d", bedata->ref);
+	if (ret)
+		goto error_xenbus;
+	ret = xenbus_printf(xbt, dev->nodename, "port", "%u",
+			    evtchn);
+	if (ret)
+		goto error_xenbus;
+	ret = xenbus_transaction_end(xbt, 0);
+	if (ret) {
+		if (ret == -EAGAIN)
+			goto again;
+		xenbus_dev_fatal(dev, ret, "completing transaction");
+		goto error;
+	}
+	xenbus_switch_state(dev, XenbusStateInitialised);
+
 	return 0;
+
+ error_xenbus:
+	xenbus_transaction_end(xbt, 1);
+	xenbus_dev_fatal(dev, ret, "writing xenstore");
+ error:
+	pvcalls_front_remove(dev);
+	return ret;
 }
 
 static void pvcalls_front_changed(struct xenbus_device *dev,
 			    enum xenbus_state backend_state)
 {
+	switch (backend_state) {
+	case XenbusStateReconfiguring:
+	case XenbusStateReconfigured:
+	case XenbusStateInitialising:
+	case XenbusStateInitialised:
+	case XenbusStateUnknown:
+		break;
+
+	case XenbusStateInitWait:
+		break;
+
+	case XenbusStateConnected:
+		xenbus_switch_state(dev, XenbusStateConnected);
+		break;
+
+	case XenbusStateClosed:
+		if (dev->state == XenbusStateClosed)
+			break;
+		/* Missed the backend's CLOSING state -- fallthrough */
+	case XenbusStateClosing:
+		xenbus_frontend_closed(dev);
+		break;
+	}
 }
 
 static struct xenbus_driver pvcalls_front_driver = {
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Send a PVCALLS_SOCKET command to the backend, use the masked
req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
ready for the response, and there cannot be two outstanding responses
with the same req_id.

Wait for the response by waiting on the inflight_req waitqueue and
check for the req_id field in rsp[req_id]. Use atomic accesses and
barriers to read the field. Note that the barriers are simple smp
barriers (as opposed to virt barriers) because they are for internal
frontend synchronization, not frontend<->backend communication.

Once a response is received, clear the corresponding rsp slot by setting
req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid
only from the frontend point of view. It is not part of the PVCalls
protocol.

pvcalls_front_event_handler is in charge of copying responses from the
ring to the appropriate rsp slot. It is done by copying the body of the
response first, then by copying req_id atomically. After the copies,
wake up anybody waiting on waitqueue.

socket_lock protects accesses to the ring.

Create a new struct sock_mapping and convert the pointer into an
uint64_t and use it as id for the new socket to pass to the backend. The
struct will be fully initialized later on connect or bind. In this patch
the struct sock_mapping is empty, the fields will be added by the next
patch.

sock->sk->sk_send_head is not used for ip sockets: reuse the field to
store a pointer to the struct sock_mapping corresponding to the socket.
This way, we can easily get the struct sock_mapping from the struct
socket.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   8 +++
 2 files changed, 141 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.h

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 490c4c1..95a985c 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -20,6 +20,8 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+#include "pvcalls-front.h"
+
 #define PVCALLS_INVALID_ID UINT_MAX
 #define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
 #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
@@ -50,8 +52,64 @@ struct pvcalls_bedata {
 	atomic_dec(&pvcalls_refcount);      \
 }
 
+struct sock_mapping {
+	bool active_socket;
+	struct list_head list;
+	struct socket *sock;
+};
+
+static inline int get_request(struct pvcalls_bedata *bedata, int *req_id)
+{
+	*req_id = bedata->ring.req_prod_pvt & (RING_SIZE(&bedata->ring) - 1);
+	if (RING_FULL(&bedata->ring) ||
+	    bedata->rsp[*req_id].req_id != PVCALLS_INVALID_ID)
+		return -EAGAIN;
+	return 0;
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
+	struct xenbus_device *dev = dev_id;
+	struct pvcalls_bedata *bedata;
+	struct xen_pvcalls_response *rsp;
+	uint8_t *src, *dst;
+	int req_id = 0, more = 0, done = 0;
+
+	if (dev == NULL)
+		return IRQ_HANDLED;
+
+	pvcalls_enter();
+	bedata = dev_get_drvdata(&dev->dev);
+	if (bedata == NULL) {
+		pvcalls_exit();
+		return IRQ_HANDLED;
+	}
+
+again:
+	while (RING_HAS_UNCONSUMED_RESPONSES(&bedata->ring)) {
+		rsp = RING_GET_RESPONSE(&bedata->ring, bedata->ring.rsp_cons);
+
+		req_id = rsp->req_id;
+		dst = (uint8_t *)&bedata->rsp[req_id] + sizeof(rsp->req_id);
+		src = (uint8_t *)rsp + sizeof(rsp->req_id);
+		memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
+		/*
+		 * First copy the rest of the data, then req_id. It is
+		 * paired with the barrier when accessing bedata->rsp.
+		 */
+		smp_wmb();
+		bedata->rsp[req_id].req_id = rsp->req_id;
+
+		done = 1;
+		bedata->ring.rsp_cons++;
+	}
+
+	RING_FINAL_CHECK_FOR_RESPONSES(&bedata->ring, more);
+	if (more)
+		goto again;
+	if (done)
+		wake_up(&bedata->inflight_req);
+	pvcalls_exit();
 	return IRQ_HANDLED;
 }
 
@@ -60,6 +118,81 @@ static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
 {
 }
 
+int pvcalls_front_socket(struct socket *sock)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret;
+
+	/*
+	 * PVCalls only supports domain AF_INET,
+	 * type SOCK_STREAM and protocol 0 sockets for now.
+	 *
+	 * Check socket type here, AF_INET and protocol checks are done
+	 * by the caller.
+	 */
+	if (sock->type != SOCK_STREAM)
+		return -ENOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -EACCES;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+
+	spin_lock(&bedata->socket_lock);
+
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		kfree(map);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+
+	/*
+	 * sock->sk->sk_send_head is not used for ip sockets: reuse the
+	 * field to store a pointer to the struct sock_mapping
+	 * corresponding to the socket. This way, we can easily get the
+	 * struct sock_mapping from the struct socket.
+	 */
+	sock->sk->sk_send_head = (void *)map;
+	list_add_tail(&map->list, &bedata->socket_mappings);
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_SOCKET;
+	req->u.socket.id = (uint64_t) map;
+	req->u.socket.domain = AF_INET;
+	req->u.socket.type = SOCK_STREAM;
+	req->u.socket.protocol = IPPROTO_IP;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
new file mode 100644
index 0000000..b7dabed
--- /dev/null
+++ b/drivers/xen/pvcalls-front.h
@@ -0,0 +1,8 @@
+#ifndef __PVCALLS_FRONT_H__
+#define __PVCALLS_FRONT_H__
+
+#include <linux/net.h>
+
+int pvcalls_front_socket(struct socket *sock);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Send a PVCALLS_SOCKET command to the backend, use the masked
req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
ready for the response, and there cannot be two outstanding responses
with the same req_id.

Wait for the response by waiting on the inflight_req waitqueue and
check for the req_id field in rsp[req_id]. Use atomic accesses and
barriers to read the field. Note that the barriers are simple smp
barriers (as opposed to virt barriers) because they are for internal
frontend synchronization, not frontend<->backend communication.

Once a response is received, clear the corresponding rsp slot by setting
req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid
only from the frontend point of view. It is not part of the PVCalls
protocol.

pvcalls_front_event_handler is in charge of copying responses from the
ring to the appropriate rsp slot. It is done by copying the body of the
response first, then by copying req_id atomically. After the copies,
wake up anybody waiting on waitqueue.

socket_lock protects accesses to the ring.

Create a new struct sock_mapping and convert the pointer into an
uint64_t and use it as id for the new socket to pass to the backend. The
struct will be fully initialized later on connect or bind. In this patch
the struct sock_mapping is empty, the fields will be added by the next
patch.

sock->sk->sk_send_head is not used for ip sockets: reuse the field to
store a pointer to the struct sock_mapping corresponding to the socket.
This way, we can easily get the struct sock_mapping from the struct
socket.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   8 +++
 2 files changed, 141 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.h

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 490c4c1..95a985c 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -20,6 +20,8 @@
 #include <xen/xenbus.h>
 #include <xen/interface/io/pvcalls.h>
 
+#include "pvcalls-front.h"
+
 #define PVCALLS_INVALID_ID UINT_MAX
 #define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
 #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
@@ -50,8 +52,64 @@ struct pvcalls_bedata {
 	atomic_dec(&pvcalls_refcount);      \
 }
 
+struct sock_mapping {
+	bool active_socket;
+	struct list_head list;
+	struct socket *sock;
+};
+
+static inline int get_request(struct pvcalls_bedata *bedata, int *req_id)
+{
+	*req_id = bedata->ring.req_prod_pvt & (RING_SIZE(&bedata->ring) - 1);
+	if (RING_FULL(&bedata->ring) ||
+	    bedata->rsp[*req_id].req_id != PVCALLS_INVALID_ID)
+		return -EAGAIN;
+	return 0;
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
+	struct xenbus_device *dev = dev_id;
+	struct pvcalls_bedata *bedata;
+	struct xen_pvcalls_response *rsp;
+	uint8_t *src, *dst;
+	int req_id = 0, more = 0, done = 0;
+
+	if (dev == NULL)
+		return IRQ_HANDLED;
+
+	pvcalls_enter();
+	bedata = dev_get_drvdata(&dev->dev);
+	if (bedata == NULL) {
+		pvcalls_exit();
+		return IRQ_HANDLED;
+	}
+
+again:
+	while (RING_HAS_UNCONSUMED_RESPONSES(&bedata->ring)) {
+		rsp = RING_GET_RESPONSE(&bedata->ring, bedata->ring.rsp_cons);
+
+		req_id = rsp->req_id;
+		dst = (uint8_t *)&bedata->rsp[req_id] + sizeof(rsp->req_id);
+		src = (uint8_t *)rsp + sizeof(rsp->req_id);
+		memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
+		/*
+		 * First copy the rest of the data, then req_id. It is
+		 * paired with the barrier when accessing bedata->rsp.
+		 */
+		smp_wmb();
+		bedata->rsp[req_id].req_id = rsp->req_id;
+
+		done = 1;
+		bedata->ring.rsp_cons++;
+	}
+
+	RING_FINAL_CHECK_FOR_RESPONSES(&bedata->ring, more);
+	if (more)
+		goto again;
+	if (done)
+		wake_up(&bedata->inflight_req);
+	pvcalls_exit();
 	return IRQ_HANDLED;
 }
 
@@ -60,6 +118,81 @@ static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
 {
 }
 
+int pvcalls_front_socket(struct socket *sock)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret;
+
+	/*
+	 * PVCalls only supports domain AF_INET,
+	 * type SOCK_STREAM and protocol 0 sockets for now.
+	 *
+	 * Check socket type here, AF_INET and protocol checks are done
+	 * by the caller.
+	 */
+	if (sock->type != SOCK_STREAM)
+		return -ENOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -EACCES;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL) {
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+
+	spin_lock(&bedata->socket_lock);
+
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		kfree(map);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+
+	/*
+	 * sock->sk->sk_send_head is not used for ip sockets: reuse the
+	 * field to store a pointer to the struct sock_mapping
+	 * corresponding to the socket. This way, we can easily get the
+	 * struct sock_mapping from the struct socket.
+	 */
+	sock->sk->sk_send_head = (void *)map;
+	list_add_tail(&map->list, &bedata->socket_mappings);
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_SOCKET;
+	req->u.socket.id = (uint64_t) map;
+	req->u.socket.domain = AF_INET;
+	req->u.socket.type = SOCK_STREAM;
+	req->u.socket.protocol = IPPROTO_IP;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
new file mode 100644
index 0000000..b7dabed
--- /dev/null
+++ b/drivers/xen/pvcalls-front.h
@@ -0,0 +1,8 @@
+#ifndef __PVCALLS_FRONT_H__
+#define __PVCALLS_FRONT_H__
+
+#include <linux/net.h>
+
+int pvcalls_front_socket(struct socket *sock);
+
+#endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 05/13] xen/pvcalls: implement connect command
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Send PVCALLS_CONNECT to the backend. Allocate a new ring and evtchn for
the active socket.

Introduce fields in struct sock_mapping to keep track of active sockets.
Introduce a waitqueue to allow the frontend to wait on data coming from
the backend on the active socket (recvmsg command).

Two mutexes (one of reads and one for writes) will be used to protect
the active socket in and out rings from concurrent accesses.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 162 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   2 +
 2 files changed, 164 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 95a985c..7c9261b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -13,6 +13,10 @@
  */
 
 #include <linux/module.h>
+#include <linux/net.h>
+#include <linux/socket.h>
+
+#include <net/sock.h>
 
 #include <xen/events.h>
 #include <xen/grant_table.h>
@@ -56,6 +60,18 @@ struct sock_mapping {
 	bool active_socket;
 	struct list_head list;
 	struct socket *sock;
+	union {
+		struct {
+			int irq;
+			grant_ref_t ref;
+			struct pvcalls_data_intf *ring;
+			struct pvcalls_data data;
+			struct mutex in_mutex;
+			struct mutex out_mutex;
+
+			wait_queue_head_t inflight_conn_req;
+		} active;
+	};
 };
 
 static inline int get_request(struct pvcalls_bedata *bedata, int *req_id)
@@ -118,6 +134,18 @@ static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
 {
 }
 
+static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
+{
+	struct sock_mapping *map = sock_map;
+
+	if (map == NULL)
+		return IRQ_HANDLED;
+
+	wake_up_interruptible(&map->active.inflight_conn_req);
+
+	return IRQ_HANDLED;
+}
+
 int pvcalls_front_socket(struct socket *sock)
 {
 	struct pvcalls_bedata *bedata;
@@ -193,6 +221,132 @@ int pvcalls_front_socket(struct socket *sock)
 	return ret;
 }
 
+static int create_active(struct sock_mapping *map, int *evtchn)
+{
+	void *bytes;
+	int ret = -ENOMEM, irq = -1, i;
+
+	*evtchn = -1;
+	init_waitqueue_head(&map->active.inflight_conn_req);
+
+	map->active.ring = (struct pvcalls_data_intf *)
+		__get_free_page(GFP_KERNEL | __GFP_ZERO);
+	if (map->active.ring == NULL)
+		goto out_error;
+	map->active.ring->ring_order = PVCALLS_RING_ORDER;
+	bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					PVCALLS_RING_ORDER);
+	if (bytes == NULL)
+		goto out_error;
+	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
+		map->active.ring->ref[i] = gnttab_grant_foreign_access(
+			pvcalls_front_dev->otherend_id,
+			pfn_to_gfn(virt_to_pfn(bytes) + i), 0);
+
+	map->active.ref = gnttab_grant_foreign_access(
+		pvcalls_front_dev->otherend_id,
+		pfn_to_gfn(virt_to_pfn((void *)map->active.ring)), 0);
+
+	map->active.data.in = bytes;
+	map->active.data.out = bytes +
+		XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+
+	ret = xenbus_alloc_evtchn(pvcalls_front_dev, evtchn);
+	if (ret)
+		goto out_error;
+	irq = bind_evtchn_to_irqhandler(*evtchn, pvcalls_front_conn_handler,
+					0, "pvcalls-frontend", map);
+	if (irq < 0) {
+		ret = irq;
+		goto out_error;
+	}
+
+	map->active.irq = irq;
+	map->active_socket = true;
+	mutex_init(&map->active.in_mutex);
+	mutex_init(&map->active.out_mutex);
+
+	return 0;
+
+out_error:
+	if (irq >= 0)
+		unbind_from_irqhandler(irq, map);
+	else if (*evtchn >= 0)
+		xenbus_free_evtchn(pvcalls_front_dev, *evtchn);
+	kfree(map->active.data.in);
+	kfree(map->active.ring);
+	return ret;
+}
+
+int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
+				int addr_len, int flags)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret, evtchn;
+
+	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
+		return -ENOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENETUNREACH;
+	}
+
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *)sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	ret = create_active(map, &evtchn);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_CONNECT;
+	req->u.connect.id = (uint64_t)map;
+	req->u.connect.len = addr_len;
+	req->u.connect.flags = flags;
+	req->u.connect.ref = map->active.ref;
+	req->u.connect.evtchn = evtchn;
+	memcpy(req->u.connect.addr, addr, sizeof(*addr));
+
+	map->sock = sock;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
@@ -209,6 +363,14 @@ static int pvcalls_front_remove(struct xenbus_device *dev)
 	if (bedata->irq >= 0)
 		unbind_from_irqhandler(bedata->irq, dev);
 
+	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
+		map->sock->sk->sk_send_head = NULL;
+		if (map->active_socket) {
+			map->active.ring->in_error = -EBADF;
+			wake_up_interruptible(&map->active.inflight_conn_req);
+		}
+	}
+
 	smp_mb();
 	while (atomic_read(&pvcalls_refcount) > 0)
 		cpu_relax();
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index b7dabed..63b0417 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -4,5 +4,7 @@
 #include <linux/net.h>
 
 int pvcalls_front_socket(struct socket *sock);
+int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
+			  int addr_len, int flags);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 05/13] xen/pvcalls: implement connect command
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Send PVCALLS_CONNECT to the backend. Allocate a new ring and evtchn for
the active socket.

Introduce fields in struct sock_mapping to keep track of active sockets.
Introduce a waitqueue to allow the frontend to wait on data coming from
the backend on the active socket (recvmsg command).

Two mutexes (one of reads and one for writes) will be used to protect
the active socket in and out rings from concurrent accesses.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 162 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   2 +
 2 files changed, 164 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 95a985c..7c9261b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -13,6 +13,10 @@
  */
 
 #include <linux/module.h>
+#include <linux/net.h>
+#include <linux/socket.h>
+
+#include <net/sock.h>
 
 #include <xen/events.h>
 #include <xen/grant_table.h>
@@ -56,6 +60,18 @@ struct sock_mapping {
 	bool active_socket;
 	struct list_head list;
 	struct socket *sock;
+	union {
+		struct {
+			int irq;
+			grant_ref_t ref;
+			struct pvcalls_data_intf *ring;
+			struct pvcalls_data data;
+			struct mutex in_mutex;
+			struct mutex out_mutex;
+
+			wait_queue_head_t inflight_conn_req;
+		} active;
+	};
 };
 
 static inline int get_request(struct pvcalls_bedata *bedata, int *req_id)
@@ -118,6 +134,18 @@ static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
 {
 }
 
+static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
+{
+	struct sock_mapping *map = sock_map;
+
+	if (map == NULL)
+		return IRQ_HANDLED;
+
+	wake_up_interruptible(&map->active.inflight_conn_req);
+
+	return IRQ_HANDLED;
+}
+
 int pvcalls_front_socket(struct socket *sock)
 {
 	struct pvcalls_bedata *bedata;
@@ -193,6 +221,132 @@ int pvcalls_front_socket(struct socket *sock)
 	return ret;
 }
 
+static int create_active(struct sock_mapping *map, int *evtchn)
+{
+	void *bytes;
+	int ret = -ENOMEM, irq = -1, i;
+
+	*evtchn = -1;
+	init_waitqueue_head(&map->active.inflight_conn_req);
+
+	map->active.ring = (struct pvcalls_data_intf *)
+		__get_free_page(GFP_KERNEL | __GFP_ZERO);
+	if (map->active.ring == NULL)
+		goto out_error;
+	map->active.ring->ring_order = PVCALLS_RING_ORDER;
+	bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					PVCALLS_RING_ORDER);
+	if (bytes == NULL)
+		goto out_error;
+	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
+		map->active.ring->ref[i] = gnttab_grant_foreign_access(
+			pvcalls_front_dev->otherend_id,
+			pfn_to_gfn(virt_to_pfn(bytes) + i), 0);
+
+	map->active.ref = gnttab_grant_foreign_access(
+		pvcalls_front_dev->otherend_id,
+		pfn_to_gfn(virt_to_pfn((void *)map->active.ring)), 0);
+
+	map->active.data.in = bytes;
+	map->active.data.out = bytes +
+		XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+
+	ret = xenbus_alloc_evtchn(pvcalls_front_dev, evtchn);
+	if (ret)
+		goto out_error;
+	irq = bind_evtchn_to_irqhandler(*evtchn, pvcalls_front_conn_handler,
+					0, "pvcalls-frontend", map);
+	if (irq < 0) {
+		ret = irq;
+		goto out_error;
+	}
+
+	map->active.irq = irq;
+	map->active_socket = true;
+	mutex_init(&map->active.in_mutex);
+	mutex_init(&map->active.out_mutex);
+
+	return 0;
+
+out_error:
+	if (irq >= 0)
+		unbind_from_irqhandler(irq, map);
+	else if (*evtchn >= 0)
+		xenbus_free_evtchn(pvcalls_front_dev, *evtchn);
+	kfree(map->active.data.in);
+	kfree(map->active.ring);
+	return ret;
+}
+
+int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
+				int addr_len, int flags)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret, evtchn;
+
+	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
+		return -ENOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENETUNREACH;
+	}
+
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *)sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	ret = create_active(map, &evtchn);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_CONNECT;
+	req->u.connect.id = (uint64_t)map;
+	req->u.connect.len = addr_len;
+	req->u.connect.flags = flags;
+	req->u.connect.ref = map->active.ref;
+	req->u.connect.evtchn = evtchn;
+	memcpy(req->u.connect.addr, addr, sizeof(*addr));
+
+	map->sock = sock;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
@@ -209,6 +363,14 @@ static int pvcalls_front_remove(struct xenbus_device *dev)
 	if (bedata->irq >= 0)
 		unbind_from_irqhandler(bedata->irq, dev);
 
+	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
+		map->sock->sk->sk_send_head = NULL;
+		if (map->active_socket) {
+			map->active.ring->in_error = -EBADF;
+			wake_up_interruptible(&map->active.inflight_conn_req);
+		}
+	}
+
 	smp_mb();
 	while (atomic_read(&pvcalls_refcount) > 0)
 		cpu_relax();
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index b7dabed..63b0417 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -4,5 +4,7 @@
 #include <linux/net.h>
 
 int pvcalls_front_socket(struct socket *sock);
+int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
+			  int addr_len, int flags);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 06/13] xen/pvcalls: implement bind command
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Send PVCALLS_BIND to the backend. Introduce a new structure, part of
struct sock_mapping, to store information specific to passive sockets.

Introduce a status field to keep track of the status of the passive
socket.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |  3 +++
 2 files changed, 69 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 7c9261b..4cafd9b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -71,6 +71,13 @@ struct sock_mapping {
 
 			wait_queue_head_t inflight_conn_req;
 		} active;
+		struct {
+		/* Socket status */
+#define PVCALLS_STATUS_UNINITALIZED  0
+#define PVCALLS_STATUS_BIND          1
+#define PVCALLS_STATUS_LISTEN        2
+			uint8_t status;
+		} passive;
 	};
 };
 
@@ -347,6 +354,65 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 	return ret;
 }
 
+int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret;
+
+	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
+		return -ENOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (map == NULL) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	map->sock = sock;
+	req->cmd = PVCALLS_BIND;
+	req->u.bind.id = (uint64_t)map;
+	memcpy(req->u.bind.addr, addr, sizeof(*addr));
+	req->u.bind.len = addr_len;
+
+	map->active_socket = false;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+
+	map->passive.status = PVCALLS_STATUS_BIND;
+	pvcalls_exit();
+	return 0;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 63b0417..8b0a274 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -6,5 +6,8 @@
 int pvcalls_front_socket(struct socket *sock);
 int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 			  int addr_len, int flags);
+int pvcalls_front_bind(struct socket *sock,
+		       struct sockaddr *addr,
+		       int addr_len);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 06/13] xen/pvcalls: implement bind command
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Send PVCALLS_BIND to the backend. Introduce a new structure, part of
struct sock_mapping, to store information specific to passive sockets.

Introduce a status field to keep track of the status of the passive
socket.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |  3 +++
 2 files changed, 69 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 7c9261b..4cafd9b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -71,6 +71,13 @@ struct sock_mapping {
 
 			wait_queue_head_t inflight_conn_req;
 		} active;
+		struct {
+		/* Socket status */
+#define PVCALLS_STATUS_UNINITALIZED  0
+#define PVCALLS_STATUS_BIND          1
+#define PVCALLS_STATUS_LISTEN        2
+			uint8_t status;
+		} passive;
 	};
 };
 
@@ -347,6 +354,65 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 	return ret;
 }
 
+int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret;
+
+	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
+		return -ENOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (map == NULL) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	map->sock = sock;
+	req->cmd = PVCALLS_BIND;
+	req->u.bind.id = (uint64_t)map;
+	memcpy(req->u.bind.addr, addr, sizeof(*addr));
+	req->u.bind.len = addr_len;
+
+	map->active_socket = false;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+
+	map->passive.status = PVCALLS_STATUS_BIND;
+	pvcalls_exit();
+	return 0;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 63b0417..8b0a274 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -6,5 +6,8 @@
 int pvcalls_front_socket(struct socket *sock);
 int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 			  int addr_len, int flags);
+int pvcalls_front_bind(struct socket *sock,
+		       struct sockaddr *addr,
+		       int addr_len);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 07/13] xen/pvcalls: implement listen command
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Send PVCALLS_LISTEN to the backend.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 57 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |  1 +
 2 files changed, 58 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 4cafd9b..5433fae 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -413,6 +413,63 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	return 0;
 }
 
+int pvcalls_front_listen(struct socket *sock, int backlog)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	if (map->passive.status != PVCALLS_STATUS_BIND) {
+		pvcalls_exit();
+		return -EOPNOTSUPP;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_LISTEN;
+	req->u.listen.id = (uint64_t) map;
+	req->u.listen.backlog = backlog;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+
+	map->passive.status = PVCALLS_STATUS_LISTEN;
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 8b0a274..aa8fe10 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -9,5 +9,6 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 int pvcalls_front_bind(struct socket *sock,
 		       struct sockaddr *addr,
 		       int addr_len);
+int pvcalls_front_listen(struct socket *sock, int backlog);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 07/13] xen/pvcalls: implement listen command
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Send PVCALLS_LISTEN to the backend.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 57 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |  1 +
 2 files changed, 58 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 4cafd9b..5433fae 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -413,6 +413,63 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	return 0;
 }
 
+int pvcalls_front_listen(struct socket *sock, int backlog)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	if (map->passive.status != PVCALLS_STATUS_BIND) {
+		pvcalls_exit();
+		return -EOPNOTSUPP;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_LISTEN;
+	req->u.listen.id = (uint64_t) map;
+	req->u.listen.backlog = backlog;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	/* read req_id, then the content */
+	smp_rmb();
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+
+	map->passive.status = PVCALLS_STATUS_LISTEN;
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 8b0a274..aa8fe10 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -9,5 +9,6 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 int pvcalls_front_bind(struct socket *sock,
 		       struct sockaddr *addr,
 		       int addr_len);
+int pvcalls_front_listen(struct socket *sock, int backlog);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-07  0:30   ` Stefano Stabellini
                     ` (7 preceding siblings ...)
  (?)
@ 2017-10-07  0:30   ` Stefano Stabellini
  2017-10-17 18:34     ` Boris Ostrovsky
  2017-10-17 18:34     ` Boris Ostrovsky
  -1 siblings, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Introduce a waitqueue to allow only one outstanding accept command at
any given time and to implement polling on the passive socket. Introduce
a flags field to keep track of in-flight accept and poll commands.

Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
sure that only one accept command is executed at any given time by
setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
inflight_accept_req waitqueue.

Convert the new struct sock_mapping pointer into an uint64_t and use it
as id for the new socket to pass to the backend.

Check if the accept call is non-blocking: in that case after sending the
ACCEPT command to the backend store the sock_mapping pointer of the new
struct and the inflight req_id then return -EAGAIN (which will respond
only when there is something to accept). Next time accept is called,
we'll check if the ACCEPT command has been answered, if so we'll pick up
where we left off, otherwise we return -EAGAIN again.

Note that, differently from the other commands, we can use
wait_event_interruptible (instead of wait_event) in the case of accept
as we are able to track the req_id of the ACCEPT response that we are
waiting.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   3 +
 2 files changed, 149 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 5433fae..8958e74 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -77,6 +77,16 @@ struct sock_mapping {
 #define PVCALLS_STATUS_BIND          1
 #define PVCALLS_STATUS_LISTEN        2
 			uint8_t status;
+		/*
+		 * Internal state-machine flags.
+		 * Only one accept operation can be inflight for a socket.
+		 * Only one poll operation can be inflight for a given socket.
+		 */
+#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
+			uint8_t flags;
+			uint32_t inflight_req_id;
+			struct sock_mapping *accept_map;
+			wait_queue_head_t inflight_accept_req;
 		} passive;
 	};
 };
@@ -392,6 +402,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	memcpy(req->u.bind.addr, addr, sizeof(*addr));
 	req->u.bind.len = addr_len;
 
+	init_waitqueue_head(&map->passive.inflight_accept_req);
+
 	map->active_socket = false;
 
 	bedata->ring.req_prod_pvt++;
@@ -470,6 +482,140 @@ int pvcalls_front_listen(struct socket *sock, int backlog)
 	return ret;
 }
 
+int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	struct sock_mapping *map2 = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret, evtchn, nonblock;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	if (map->passive.status != PVCALLS_STATUS_LISTEN) {
+		pvcalls_exit();
+		return -EINVAL;
+	}
+
+	nonblock = flags & SOCK_NONBLOCK;
+	/*
+	 * Backend only supports 1 inflight accept request, will return
+	 * errors for the others
+	 */
+	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			     (void *)&map->passive.flags)) {
+		req_id = READ_ONCE(map->passive.inflight_req_id);
+		if (req_id != PVCALLS_INVALID_ID &&
+		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
+			map2 = map->passive.accept_map;
+			goto received;
+		}
+		if (nonblock) {
+			pvcalls_exit();
+			return -EAGAIN;
+		}
+		if (wait_event_interruptible(map->passive.inflight_accept_req,
+			!test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+					  (void *)&map->passive.flags))) {
+			pvcalls_exit();
+			return -EINTR;
+		}
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
+	if (map2 == NULL) {
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+	ret =  create_active(map2, &evtchn);
+	if (ret < 0) {
+		kfree(map2);
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+	list_add_tail(&map2->list, &bedata->socket_mappings);
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_ACCEPT;
+	req->u.accept.id = (uint64_t) map;
+	req->u.accept.ref = map2->active.ref;
+	req->u.accept.id_new = (uint64_t) map2;
+	req->u.accept.evtchn = evtchn;
+	map->passive.accept_map = map2;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+	/* We could check if we have received a response before returning. */
+	if (nonblock) {
+		WRITE_ONCE(map->passive.inflight_req_id, req_id);
+		pvcalls_exit();
+		return -EAGAIN;
+	}
+
+	if (wait_event_interruptible(bedata->inflight_req,
+		READ_ONCE(bedata->rsp[req_id].req_id) == req_id)) {
+		pvcalls_exit();
+		return -EINTR;
+	}
+	/* read req_id, then the content */
+	smp_rmb();
+
+received:
+	map2->sock = newsock;
+	newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL);
+	if (!newsock->sk) {
+		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		pvcalls_front_free_map(bedata, map2);
+		kfree(map2);
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+	newsock->sk->sk_send_head = (void *)map2;
+
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+	map->passive.inflight_req_id = PVCALLS_INVALID_ID;
+
+	clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)&map->passive.flags);
+	wake_up(&map->passive.inflight_accept_req);
+
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index aa8fe10..ab4f1da 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock,
 		       struct sockaddr *addr,
 		       int addr_len);
 int pvcalls_front_listen(struct socket *sock, int backlog);
+int pvcalls_front_accept(struct socket *sock,
+			 struct socket *newsock,
+			 int flags);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-07  0:30   ` Stefano Stabellini
                     ` (8 preceding siblings ...)
  (?)
@ 2017-10-07  0:30   ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Introduce a waitqueue to allow only one outstanding accept command at
any given time and to implement polling on the passive socket. Introduce
a flags field to keep track of in-flight accept and poll commands.

Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
sure that only one accept command is executed at any given time by
setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
inflight_accept_req waitqueue.

Convert the new struct sock_mapping pointer into an uint64_t and use it
as id for the new socket to pass to the backend.

Check if the accept call is non-blocking: in that case after sending the
ACCEPT command to the backend store the sock_mapping pointer of the new
struct and the inflight req_id then return -EAGAIN (which will respond
only when there is something to accept). Next time accept is called,
we'll check if the ACCEPT command has been answered, if so we'll pick up
where we left off, otherwise we return -EAGAIN again.

Note that, differently from the other commands, we can use
wait_event_interruptible (instead of wait_event) in the case of accept
as we are able to track the req_id of the ACCEPT response that we are
waiting.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   3 +
 2 files changed, 149 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 5433fae..8958e74 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -77,6 +77,16 @@ struct sock_mapping {
 #define PVCALLS_STATUS_BIND          1
 #define PVCALLS_STATUS_LISTEN        2
 			uint8_t status;
+		/*
+		 * Internal state-machine flags.
+		 * Only one accept operation can be inflight for a socket.
+		 * Only one poll operation can be inflight for a given socket.
+		 */
+#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
+			uint8_t flags;
+			uint32_t inflight_req_id;
+			struct sock_mapping *accept_map;
+			wait_queue_head_t inflight_accept_req;
 		} passive;
 	};
 };
@@ -392,6 +402,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	memcpy(req->u.bind.addr, addr, sizeof(*addr));
 	req->u.bind.len = addr_len;
 
+	init_waitqueue_head(&map->passive.inflight_accept_req);
+
 	map->active_socket = false;
 
 	bedata->ring.req_prod_pvt++;
@@ -470,6 +482,140 @@ int pvcalls_front_listen(struct socket *sock, int backlog)
 	return ret;
 }
 
+int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	struct sock_mapping *map2 = NULL;
+	struct xen_pvcalls_request *req;
+	int notify, req_id, ret, evtchn, nonblock;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	if (map->passive.status != PVCALLS_STATUS_LISTEN) {
+		pvcalls_exit();
+		return -EINVAL;
+	}
+
+	nonblock = flags & SOCK_NONBLOCK;
+	/*
+	 * Backend only supports 1 inflight accept request, will return
+	 * errors for the others
+	 */
+	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			     (void *)&map->passive.flags)) {
+		req_id = READ_ONCE(map->passive.inflight_req_id);
+		if (req_id != PVCALLS_INVALID_ID &&
+		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
+			map2 = map->passive.accept_map;
+			goto received;
+		}
+		if (nonblock) {
+			pvcalls_exit();
+			return -EAGAIN;
+		}
+		if (wait_event_interruptible(map->passive.inflight_accept_req,
+			!test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+					  (void *)&map->passive.flags))) {
+			pvcalls_exit();
+			return -EINTR;
+		}
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
+	if (map2 == NULL) {
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+	ret =  create_active(map2, &evtchn);
+	if (ret < 0) {
+		kfree(map2);
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+	list_add_tail(&map2->list, &bedata->socket_mappings);
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_ACCEPT;
+	req->u.accept.id = (uint64_t) map;
+	req->u.accept.ref = map2->active.ref;
+	req->u.accept.id_new = (uint64_t) map2;
+	req->u.accept.evtchn = evtchn;
+	map->passive.accept_map = map2;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+	/* We could check if we have received a response before returning. */
+	if (nonblock) {
+		WRITE_ONCE(map->passive.inflight_req_id, req_id);
+		pvcalls_exit();
+		return -EAGAIN;
+	}
+
+	if (wait_event_interruptible(bedata->inflight_req,
+		READ_ONCE(bedata->rsp[req_id].req_id) == req_id)) {
+		pvcalls_exit();
+		return -EINTR;
+	}
+	/* read req_id, then the content */
+	smp_rmb();
+
+received:
+	map2->sock = newsock;
+	newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL);
+	if (!newsock->sk) {
+		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
+		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+			  (void *)&map->passive.flags);
+		pvcalls_front_free_map(bedata, map2);
+		kfree(map2);
+		pvcalls_exit();
+		return -ENOMEM;
+	}
+	newsock->sk->sk_send_head = (void *)map2;
+
+	ret = bedata->rsp[req_id].ret;
+	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
+	map->passive.inflight_req_id = PVCALLS_INVALID_ID;
+
+	clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)&map->passive.flags);
+	wake_up(&map->passive.inflight_accept_req);
+
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index aa8fe10..ab4f1da 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock,
 		       struct sockaddr *addr,
 		       int addr_len);
 int pvcalls_front_listen(struct socket *sock, int backlog);
+int pvcalls_front_accept(struct socket *sock,
+			 struct socket *newsock,
+			 int flags);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 09/13] xen/pvcalls: implement sendmsg
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Send data to an active socket by copying data to the "out" ring. Take
the active socket out_mutex so that only one function can access the
ring at any given time.

If not enough room is available on the ring, rather than returning
immediately or sleep-waiting, spin for up to 5000 cycles. This small
optimization turns out to improve performance significantly.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 118 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   3 ++
 2 files changed, 121 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 8958e74..c13c40a 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -29,6 +29,7 @@
 #define PVCALLS_INVALID_ID UINT_MAX
 #define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
 #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
+#define PVCALLS_FRONT_MAX_SPIN 5000
 
 struct pvcalls_bedata {
 	struct xen_pvcalls_front_ring ring;
@@ -100,6 +101,23 @@ static inline int get_request(struct pvcalls_bedata *bedata, int *req_id)
 	return 0;
 }
 
+static bool pvcalls_front_write_todo(struct sock_mapping *map)
+{
+	struct pvcalls_data_intf *intf = map->active.ring;
+	RING_IDX cons, prod, size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+	int32_t error;
+
+	error = intf->out_error;
+	if (error == -ENOTCONN)
+		return false;
+	if (error != 0)
+		return true;
+
+	cons = intf->out_cons;
+	prod = intf->out_prod;
+	return !!(size - pvcalls_queued(prod, cons, size));
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
 	struct xenbus_device *dev = dev_id;
@@ -364,6 +382,106 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 	return ret;
 }
 
+static int __write_ring(struct pvcalls_data_intf *intf,
+			struct pvcalls_data *data,
+			struct iov_iter *msg_iter,
+			int len)
+{
+	RING_IDX cons, prod, size, masked_prod, masked_cons;
+	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+	int32_t error;
+
+	error = intf->out_error;
+	if (error < 0)
+		return error;
+	cons = intf->out_cons;
+	prod = intf->out_prod;
+	/* read indexes before continuing */
+	virt_mb();
+
+	size = pvcalls_queued(prod, cons, array_size);
+	if (size >= array_size)
+		return 0;
+	if (len > array_size - size)
+		len = array_size - size;
+
+	masked_prod = pvcalls_mask(prod, array_size);
+	masked_cons = pvcalls_mask(cons, array_size);
+
+	if (masked_prod < masked_cons) {
+		copy_from_iter(data->out + masked_prod, len, msg_iter);
+	} else {
+		if (len > array_size - masked_prod) {
+			copy_from_iter(data->out + masked_prod,
+				       array_size - masked_prod, msg_iter);
+			copy_from_iter(data->out,
+				       len - (array_size - masked_prod),
+				       msg_iter);
+		} else {
+			copy_from_iter(data->out + masked_prod, len, msg_iter);
+		}
+	}
+	/* write to ring before updating pointer */
+	virt_wmb();
+	intf->out_prod += len;
+
+	return len;
+}
+
+int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg,
+			  size_t len)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	int sent, tot_sent = 0;
+	int count = 0, flags;
+
+	flags = msg->msg_flags;
+	if (flags & (MSG_CONFIRM|MSG_DONTROUTE|MSG_EOR|MSG_OOB))
+		return -EOPNOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	mutex_lock(&map->active.out_mutex);
+	if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) {
+		mutex_unlock(&map->active.out_mutex);
+		pvcalls_exit();
+		return -EAGAIN;
+	}
+	if (len > INT_MAX)
+		len = INT_MAX;
+
+again:
+	count++;
+	sent = __write_ring(map->active.ring,
+			    &map->active.data, &msg->msg_iter,
+			    len);
+	if (sent > 0) {
+		len -= sent;
+		tot_sent += sent;
+		notify_remote_via_irq(map->active.irq);
+	}
+	if (sent >= 0 && len > 0 && count < PVCALLS_FRONT_MAX_SPIN)
+		goto again;
+	if (sent < 0)
+		tot_sent = sent;
+
+	mutex_unlock(&map->active.out_mutex);
+	pvcalls_exit();
+	return tot_sent;
+}
+
 int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct pvcalls_bedata *bedata;
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index ab4f1da..d937c24 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -13,5 +13,8 @@ int pvcalls_front_bind(struct socket *sock,
 int pvcalls_front_accept(struct socket *sock,
 			 struct socket *newsock,
 			 int flags);
+int pvcalls_front_sendmsg(struct socket *sock,
+			  struct msghdr *msg,
+			  size_t len);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 09/13] xen/pvcalls: implement sendmsg
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Send data to an active socket by copying data to the "out" ring. Take
the active socket out_mutex so that only one function can access the
ring at any given time.

If not enough room is available on the ring, rather than returning
immediately or sleep-waiting, spin for up to 5000 cycles. This small
optimization turns out to improve performance significantly.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 118 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   3 ++
 2 files changed, 121 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 8958e74..c13c40a 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -29,6 +29,7 @@
 #define PVCALLS_INVALID_ID UINT_MAX
 #define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
 #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
+#define PVCALLS_FRONT_MAX_SPIN 5000
 
 struct pvcalls_bedata {
 	struct xen_pvcalls_front_ring ring;
@@ -100,6 +101,23 @@ static inline int get_request(struct pvcalls_bedata *bedata, int *req_id)
 	return 0;
 }
 
+static bool pvcalls_front_write_todo(struct sock_mapping *map)
+{
+	struct pvcalls_data_intf *intf = map->active.ring;
+	RING_IDX cons, prod, size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+	int32_t error;
+
+	error = intf->out_error;
+	if (error == -ENOTCONN)
+		return false;
+	if (error != 0)
+		return true;
+
+	cons = intf->out_cons;
+	prod = intf->out_prod;
+	return !!(size - pvcalls_queued(prod, cons, size));
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
 	struct xenbus_device *dev = dev_id;
@@ -364,6 +382,106 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
 	return ret;
 }
 
+static int __write_ring(struct pvcalls_data_intf *intf,
+			struct pvcalls_data *data,
+			struct iov_iter *msg_iter,
+			int len)
+{
+	RING_IDX cons, prod, size, masked_prod, masked_cons;
+	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+	int32_t error;
+
+	error = intf->out_error;
+	if (error < 0)
+		return error;
+	cons = intf->out_cons;
+	prod = intf->out_prod;
+	/* read indexes before continuing */
+	virt_mb();
+
+	size = pvcalls_queued(prod, cons, array_size);
+	if (size >= array_size)
+		return 0;
+	if (len > array_size - size)
+		len = array_size - size;
+
+	masked_prod = pvcalls_mask(prod, array_size);
+	masked_cons = pvcalls_mask(cons, array_size);
+
+	if (masked_prod < masked_cons) {
+		copy_from_iter(data->out + masked_prod, len, msg_iter);
+	} else {
+		if (len > array_size - masked_prod) {
+			copy_from_iter(data->out + masked_prod,
+				       array_size - masked_prod, msg_iter);
+			copy_from_iter(data->out,
+				       len - (array_size - masked_prod),
+				       msg_iter);
+		} else {
+			copy_from_iter(data->out + masked_prod, len, msg_iter);
+		}
+	}
+	/* write to ring before updating pointer */
+	virt_wmb();
+	intf->out_prod += len;
+
+	return len;
+}
+
+int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg,
+			  size_t len)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	int sent, tot_sent = 0;
+	int count = 0, flags;
+
+	flags = msg->msg_flags;
+	if (flags & (MSG_CONFIRM|MSG_DONTROUTE|MSG_EOR|MSG_OOB))
+		return -EOPNOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	mutex_lock(&map->active.out_mutex);
+	if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) {
+		mutex_unlock(&map->active.out_mutex);
+		pvcalls_exit();
+		return -EAGAIN;
+	}
+	if (len > INT_MAX)
+		len = INT_MAX;
+
+again:
+	count++;
+	sent = __write_ring(map->active.ring,
+			    &map->active.data, &msg->msg_iter,
+			    len);
+	if (sent > 0) {
+		len -= sent;
+		tot_sent += sent;
+		notify_remote_via_irq(map->active.irq);
+	}
+	if (sent >= 0 && len > 0 && count < PVCALLS_FRONT_MAX_SPIN)
+		goto again;
+	if (sent < 0)
+		tot_sent = sent;
+
+	mutex_unlock(&map->active.out_mutex);
+	pvcalls_exit();
+	return tot_sent;
+}
+
 int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct pvcalls_bedata *bedata;
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index ab4f1da..d937c24 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -13,5 +13,8 @@ int pvcalls_front_bind(struct socket *sock,
 int pvcalls_front_accept(struct socket *sock,
 			 struct socket *newsock,
 			 int flags);
+int pvcalls_front_sendmsg(struct socket *sock,
+			  struct msghdr *msg,
+			  size_t len);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-07  0:30   ` Stefano Stabellini
                     ` (11 preceding siblings ...)
  (?)
@ 2017-10-07  0:30   ` Stefano Stabellini
  2017-10-17 21:35     ` Boris Ostrovsky
  2017-10-17 21:35     ` Boris Ostrovsky
  -1 siblings, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Implement recvmsg by copying data from the "in" ring. If not enough data
is available and the recvmsg call is blocking, then wait on the
inflight_conn_req waitqueue. Take the active socket in_mutex so that
only one function can access the ring at any given time.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 108 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   4 ++
 2 files changed, 112 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index c13c40a..161f88b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -118,6 +118,20 @@ static bool pvcalls_front_write_todo(struct sock_mapping *map)
 	return !!(size - pvcalls_queued(prod, cons, size));
 }
 
+static bool pvcalls_front_read_todo(struct sock_mapping *map)
+{
+	struct pvcalls_data_intf *intf = map->active.ring;
+	RING_IDX cons, prod;
+	int32_t error;
+
+	cons = intf->in_cons;
+	prod = intf->in_prod;
+	error = intf->in_error;
+	return (error != 0 ||
+		pvcalls_queued(prod, cons,
+			       XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER)) != 0);
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
 	struct xenbus_device *dev = dev_id;
@@ -482,6 +496,100 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg,
 	return tot_sent;
 }
 
+static int __read_ring(struct pvcalls_data_intf *intf,
+		       struct pvcalls_data *data,
+		       struct iov_iter *msg_iter,
+		       size_t len, int flags)
+{
+	RING_IDX cons, prod, size, masked_prod, masked_cons;
+	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+	int32_t error;
+
+	cons = intf->in_cons;
+	prod = intf->in_prod;
+	error = intf->in_error;
+	/* get pointers before reading from the ring */
+	virt_rmb();
+	if (error < 0)
+		return error;
+
+	size = pvcalls_queued(prod, cons, array_size);
+	masked_prod = pvcalls_mask(prod, array_size);
+	masked_cons = pvcalls_mask(cons, array_size);
+
+	if (size == 0)
+		return 0;
+
+	if (len > size)
+		len = size;
+
+	if (masked_prod > masked_cons) {
+		copy_to_iter(data->in + masked_cons, len, msg_iter);
+	} else {
+		if (len > (array_size - masked_cons)) {
+			copy_to_iter(data->in + masked_cons,
+				     array_size - masked_cons, msg_iter);
+			copy_to_iter(data->in,
+				     len - (array_size - masked_cons),
+				     msg_iter);
+		} else {
+			copy_to_iter(data->in + masked_cons, len, msg_iter);
+		}
+	}
+	/* read data from the ring before increasing the index */
+	virt_mb();
+	if (!(flags & MSG_PEEK))
+		intf->in_cons += len;
+
+	return len;
+}
+
+int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+		     int flags)
+{
+	struct pvcalls_bedata *bedata;
+	int ret;
+	struct sock_mapping *map;
+
+	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
+		return -EOPNOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	mutex_lock(&map->active.in_mutex);
+	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
+		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+
+	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
+		wait_event_interruptible(map->active.inflight_conn_req,
+					 pvcalls_front_read_todo(map));
+	}
+	ret = __read_ring(map->active.ring, &map->active.data,
+			  &msg->msg_iter, len, flags);
+
+	if (ret > 0)
+		notify_remote_via_irq(map->active.irq);
+	if (ret == 0)
+		ret = -EAGAIN;
+	if (ret == -ENOTCONN)
+		ret = 0;
+
+	mutex_unlock(&map->active.in_mutex);
+	pvcalls_exit();
+	return ret;
+}
+
 int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct pvcalls_bedata *bedata;
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index d937c24..de24041 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -16,5 +16,9 @@ int pvcalls_front_accept(struct socket *sock,
 int pvcalls_front_sendmsg(struct socket *sock,
 			  struct msghdr *msg,
 			  size_t len);
+int pvcalls_front_recvmsg(struct socket *sock,
+			  struct msghdr *msg,
+			  size_t len,
+			  int flags);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-07  0:30   ` Stefano Stabellini
                     ` (10 preceding siblings ...)
  (?)
@ 2017-10-07  0:30   ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Implement recvmsg by copying data from the "in" ring. If not enough data
is available and the recvmsg call is blocking, then wait on the
inflight_conn_req waitqueue. Take the active socket in_mutex so that
only one function can access the ring at any given time.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 108 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |   4 ++
 2 files changed, 112 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index c13c40a..161f88b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -118,6 +118,20 @@ static bool pvcalls_front_write_todo(struct sock_mapping *map)
 	return !!(size - pvcalls_queued(prod, cons, size));
 }
 
+static bool pvcalls_front_read_todo(struct sock_mapping *map)
+{
+	struct pvcalls_data_intf *intf = map->active.ring;
+	RING_IDX cons, prod;
+	int32_t error;
+
+	cons = intf->in_cons;
+	prod = intf->in_prod;
+	error = intf->in_error;
+	return (error != 0 ||
+		pvcalls_queued(prod, cons,
+			       XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER)) != 0);
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
 	struct xenbus_device *dev = dev_id;
@@ -482,6 +496,100 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg,
 	return tot_sent;
 }
 
+static int __read_ring(struct pvcalls_data_intf *intf,
+		       struct pvcalls_data *data,
+		       struct iov_iter *msg_iter,
+		       size_t len, int flags)
+{
+	RING_IDX cons, prod, size, masked_prod, masked_cons;
+	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+	int32_t error;
+
+	cons = intf->in_cons;
+	prod = intf->in_prod;
+	error = intf->in_error;
+	/* get pointers before reading from the ring */
+	virt_rmb();
+	if (error < 0)
+		return error;
+
+	size = pvcalls_queued(prod, cons, array_size);
+	masked_prod = pvcalls_mask(prod, array_size);
+	masked_cons = pvcalls_mask(cons, array_size);
+
+	if (size == 0)
+		return 0;
+
+	if (len > size)
+		len = size;
+
+	if (masked_prod > masked_cons) {
+		copy_to_iter(data->in + masked_cons, len, msg_iter);
+	} else {
+		if (len > (array_size - masked_cons)) {
+			copy_to_iter(data->in + masked_cons,
+				     array_size - masked_cons, msg_iter);
+			copy_to_iter(data->in,
+				     len - (array_size - masked_cons),
+				     msg_iter);
+		} else {
+			copy_to_iter(data->in + masked_cons, len, msg_iter);
+		}
+	}
+	/* read data from the ring before increasing the index */
+	virt_mb();
+	if (!(flags & MSG_PEEK))
+		intf->in_cons += len;
+
+	return len;
+}
+
+int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+		     int flags)
+{
+	struct pvcalls_bedata *bedata;
+	int ret;
+	struct sock_mapping *map;
+
+	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
+		return -EOPNOTSUPP;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -ENOTCONN;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return -ENOTSOCK;
+	}
+
+	mutex_lock(&map->active.in_mutex);
+	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
+		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
+
+	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
+		wait_event_interruptible(map->active.inflight_conn_req,
+					 pvcalls_front_read_todo(map));
+	}
+	ret = __read_ring(map->active.ring, &map->active.data,
+			  &msg->msg_iter, len, flags);
+
+	if (ret > 0)
+		notify_remote_via_irq(map->active.irq);
+	if (ret == 0)
+		ret = -EAGAIN;
+	if (ret == -ENOTCONN)
+		ret = 0;
+
+	mutex_unlock(&map->active.in_mutex);
+	pvcalls_exit();
+	return ret;
+}
+
 int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct pvcalls_bedata *bedata;
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index d937c24..de24041 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -16,5 +16,9 @@ int pvcalls_front_accept(struct socket *sock,
 int pvcalls_front_sendmsg(struct socket *sock,
 			  struct msghdr *msg,
 			  size_t len);
+int pvcalls_front_recvmsg(struct socket *sock,
+			  struct msghdr *msg,
+			  size_t len,
+			  int flags);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 11/13] xen/pvcalls: implement poll command
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

For active sockets, check the indexes and use the inflight_conn_req
waitqueue to wait.

For passive sockets if an accept is outstanding
(PVCALLS_FLAG_ACCEPT_INFLIGHT), check if it has been answered by looking
at bedata->rsp[req_id]. If so, return POLLIN.  Otherwise use the
inflight_accept_req waitqueue.

If no accepts are inflight, send PVCALLS_POLL to the backend. If we have
outstanding POLL requests awaiting for a response use the inflight_req
waitqueue: inflight_req is awaken when a new response is received; on
wakeup we check whether the POLL response is arrived by looking at the
PVCALLS_FLAG_POLL_RET flag. We set the flag from
pvcalls_front_event_handler, if the response was for a POLL command.

In pvcalls_front_event_handler, get the struct sock_mapping from the
poll id (we previously converted struct sock_mapping* to uint64_t and
used it as id).

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 144 +++++++++++++++++++++++++++++++++++++++++---
 drivers/xen/pvcalls-front.h |   3 +
 2 files changed, 138 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 161f88b..aca2b32 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -84,6 +84,8 @@ struct sock_mapping {
 		 * Only one poll operation can be inflight for a given socket.
 		 */
 #define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
+#define PVCALLS_FLAG_POLL_INFLIGHT   1
+#define PVCALLS_FLAG_POLL_RET        2
 			uint8_t flags;
 			uint32_t inflight_req_id;
 			struct sock_mapping *accept_map;
@@ -155,15 +157,32 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 		rsp = RING_GET_RESPONSE(&bedata->ring, bedata->ring.rsp_cons);
 
 		req_id = rsp->req_id;
-		dst = (uint8_t *)&bedata->rsp[req_id] + sizeof(rsp->req_id);
-		src = (uint8_t *)rsp + sizeof(rsp->req_id);
-		memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
-		/*
-		 * First copy the rest of the data, then req_id. It is
-		 * paired with the barrier when accessing bedata->rsp.
-		 */
-		smp_wmb();
-		bedata->rsp[req_id].req_id = rsp->req_id;
+		if (rsp->cmd == PVCALLS_POLL) {
+			struct sock_mapping *map = (struct sock_mapping *)
+						   rsp->u.poll.id;
+
+			clear_bit(PVCALLS_FLAG_POLL_INFLIGHT,
+				  (void *)&map->passive.flags);
+			/*
+			 * clear INFLIGHT, then set RET. It pairs with
+			 * the checks at the beginning of
+			 * pvcalls_front_poll_passive.
+			 */
+			smp_wmb();
+			set_bit(PVCALLS_FLAG_POLL_RET,
+				(void *)&map->passive.flags);
+		} else {
+			dst = (uint8_t *)&bedata->rsp[req_id] +
+			      sizeof(rsp->req_id);
+			src = (uint8_t *)rsp + sizeof(rsp->req_id);
+			memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
+			/*
+			 * First copy the rest of the data, then req_id. It is
+			 * paired with the barrier when accessing bedata->rsp.
+			 */
+			smp_wmb();
+			bedata->rsp[req_id].req_id = req_id;
+		}
 
 		done = 1;
 		bedata->ring.rsp_cons++;
@@ -842,6 +861,113 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
 	return ret;
 }
 
+static unsigned int pvcalls_front_poll_passive(struct file *file,
+					       struct pvcalls_bedata *bedata,
+					       struct sock_mapping *map,
+					       poll_table *wait)
+{
+	int notify, req_id, ret;
+	struct xen_pvcalls_request *req;
+
+	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+		     (void *)&map->passive.flags)) {
+		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
+
+		if (req_id != PVCALLS_INVALID_ID &&
+		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
+			return POLLIN | POLLRDNORM;
+
+		poll_wait(file, &map->passive.inflight_accept_req, wait);
+		return 0;
+	}
+
+	if (test_and_clear_bit(PVCALLS_FLAG_POLL_RET,
+			       (void *)&map->passive.flags))
+		return POLLIN | POLLRDNORM;
+
+	/*
+	 * First check RET, then INFLIGHT. No barriers necessary to
+	 * ensure execution ordering because of the conditional
+	 * instructions creating control dependencies.
+	 */
+
+	if (test_and_set_bit(PVCALLS_FLAG_POLL_INFLIGHT,
+			     (void *)&map->passive.flags)) {
+		poll_wait(file, &bedata->inflight_req, wait);
+		return 0;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		return ret;
+	}
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_POLL;
+	req->u.poll.id = (uint64_t) map;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	poll_wait(file, &bedata->inflight_req, wait);
+	return 0;
+}
+
+static unsigned int pvcalls_front_poll_active(struct file *file,
+					      struct pvcalls_bedata *bedata,
+					      struct sock_mapping *map,
+					      poll_table *wait)
+{
+	unsigned int mask = 0;
+	int32_t in_error, out_error;
+	struct pvcalls_data_intf *intf = map->active.ring;
+
+	out_error = intf->out_error;
+	in_error = intf->in_error;
+
+	poll_wait(file, &map->active.inflight_conn_req, wait);
+	if (pvcalls_front_write_todo(map))
+		mask |= POLLOUT | POLLWRNORM;
+	if (pvcalls_front_read_todo(map))
+		mask |= POLLIN | POLLRDNORM;
+	if (in_error != 0 || out_error != 0)
+		mask |= POLLERR;
+
+	return mask;
+}
+
+unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
+			       poll_table *wait)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	int ret;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return POLLNVAL;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return POLLNVAL;
+	}
+	if (map->active_socket)
+		ret = pvcalls_front_poll_active(file, bedata, map, wait);
+	else
+		ret = pvcalls_front_poll_passive(file, bedata, map, wait);
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index de24041..25e05b8 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -20,5 +20,8 @@ int pvcalls_front_recvmsg(struct socket *sock,
 			  struct msghdr *msg,
 			  size_t len,
 			  int flags);
+unsigned int pvcalls_front_poll(struct file *file,
+				struct socket *sock,
+				poll_table *wait);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 11/13] xen/pvcalls: implement poll command
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

For active sockets, check the indexes and use the inflight_conn_req
waitqueue to wait.

For passive sockets if an accept is outstanding
(PVCALLS_FLAG_ACCEPT_INFLIGHT), check if it has been answered by looking
at bedata->rsp[req_id]. If so, return POLLIN.  Otherwise use the
inflight_accept_req waitqueue.

If no accepts are inflight, send PVCALLS_POLL to the backend. If we have
outstanding POLL requests awaiting for a response use the inflight_req
waitqueue: inflight_req is awaken when a new response is received; on
wakeup we check whether the POLL response is arrived by looking at the
PVCALLS_FLAG_POLL_RET flag. We set the flag from
pvcalls_front_event_handler, if the response was for a POLL command.

In pvcalls_front_event_handler, get the struct sock_mapping from the
poll id (we previously converted struct sock_mapping* to uint64_t and
used it as id).

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 144 +++++++++++++++++++++++++++++++++++++++++---
 drivers/xen/pvcalls-front.h |   3 +
 2 files changed, 138 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 161f88b..aca2b32 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -84,6 +84,8 @@ struct sock_mapping {
 		 * Only one poll operation can be inflight for a given socket.
 		 */
 #define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
+#define PVCALLS_FLAG_POLL_INFLIGHT   1
+#define PVCALLS_FLAG_POLL_RET        2
 			uint8_t flags;
 			uint32_t inflight_req_id;
 			struct sock_mapping *accept_map;
@@ -155,15 +157,32 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 		rsp = RING_GET_RESPONSE(&bedata->ring, bedata->ring.rsp_cons);
 
 		req_id = rsp->req_id;
-		dst = (uint8_t *)&bedata->rsp[req_id] + sizeof(rsp->req_id);
-		src = (uint8_t *)rsp + sizeof(rsp->req_id);
-		memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
-		/*
-		 * First copy the rest of the data, then req_id. It is
-		 * paired with the barrier when accessing bedata->rsp.
-		 */
-		smp_wmb();
-		bedata->rsp[req_id].req_id = rsp->req_id;
+		if (rsp->cmd == PVCALLS_POLL) {
+			struct sock_mapping *map = (struct sock_mapping *)
+						   rsp->u.poll.id;
+
+			clear_bit(PVCALLS_FLAG_POLL_INFLIGHT,
+				  (void *)&map->passive.flags);
+			/*
+			 * clear INFLIGHT, then set RET. It pairs with
+			 * the checks at the beginning of
+			 * pvcalls_front_poll_passive.
+			 */
+			smp_wmb();
+			set_bit(PVCALLS_FLAG_POLL_RET,
+				(void *)&map->passive.flags);
+		} else {
+			dst = (uint8_t *)&bedata->rsp[req_id] +
+			      sizeof(rsp->req_id);
+			src = (uint8_t *)rsp + sizeof(rsp->req_id);
+			memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
+			/*
+			 * First copy the rest of the data, then req_id. It is
+			 * paired with the barrier when accessing bedata->rsp.
+			 */
+			smp_wmb();
+			bedata->rsp[req_id].req_id = req_id;
+		}
 
 		done = 1;
 		bedata->ring.rsp_cons++;
@@ -842,6 +861,113 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
 	return ret;
 }
 
+static unsigned int pvcalls_front_poll_passive(struct file *file,
+					       struct pvcalls_bedata *bedata,
+					       struct sock_mapping *map,
+					       poll_table *wait)
+{
+	int notify, req_id, ret;
+	struct xen_pvcalls_request *req;
+
+	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+		     (void *)&map->passive.flags)) {
+		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
+
+		if (req_id != PVCALLS_INVALID_ID &&
+		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
+			return POLLIN | POLLRDNORM;
+
+		poll_wait(file, &map->passive.inflight_accept_req, wait);
+		return 0;
+	}
+
+	if (test_and_clear_bit(PVCALLS_FLAG_POLL_RET,
+			       (void *)&map->passive.flags))
+		return POLLIN | POLLRDNORM;
+
+	/*
+	 * First check RET, then INFLIGHT. No barriers necessary to
+	 * ensure execution ordering because of the conditional
+	 * instructions creating control dependencies.
+	 */
+
+	if (test_and_set_bit(PVCALLS_FLAG_POLL_INFLIGHT,
+			     (void *)&map->passive.flags)) {
+		poll_wait(file, &bedata->inflight_req, wait);
+		return 0;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		return ret;
+	}
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_POLL;
+	req->u.poll.id = (uint64_t) map;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	poll_wait(file, &bedata->inflight_req, wait);
+	return 0;
+}
+
+static unsigned int pvcalls_front_poll_active(struct file *file,
+					      struct pvcalls_bedata *bedata,
+					      struct sock_mapping *map,
+					      poll_table *wait)
+{
+	unsigned int mask = 0;
+	int32_t in_error, out_error;
+	struct pvcalls_data_intf *intf = map->active.ring;
+
+	out_error = intf->out_error;
+	in_error = intf->in_error;
+
+	poll_wait(file, &map->active.inflight_conn_req, wait);
+	if (pvcalls_front_write_todo(map))
+		mask |= POLLOUT | POLLWRNORM;
+	if (pvcalls_front_read_todo(map))
+		mask |= POLLIN | POLLRDNORM;
+	if (in_error != 0 || out_error != 0)
+		mask |= POLLERR;
+
+	return mask;
+}
+
+unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
+			       poll_table *wait)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	int ret;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return POLLNVAL;
+	}
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (!map) {
+		pvcalls_exit();
+		return POLLNVAL;
+	}
+	if (map->active_socket)
+		ret = pvcalls_front_poll_active(file, bedata, map, wait);
+	else
+		ret = pvcalls_front_poll_passive(file, bedata, map, wait);
+	pvcalls_exit();
+	return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index de24041..25e05b8 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -20,5 +20,8 @@ int pvcalls_front_recvmsg(struct socket *sock,
 			  struct msghdr *msg,
 			  size_t len,
 			  int flags);
+unsigned int pvcalls_front_poll(struct file *file,
+				struct socket *sock,
+				poll_table *wait);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 12/13] xen/pvcalls: implement release command
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
in_mutex and out_mutex to avoid concurrent accesses. Then, free the
socket.

For passive sockets, check whether we have already pre-allocated an
active socket for the purpose of being accepted. If so, free that as
well.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |  1 +
 2 files changed, 99 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index aca2b32..9beb34d 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -200,6 +200,19 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
 				   struct sock_mapping *map)
 {
+	int i;
+
+	unbind_from_irqhandler(map->active.irq, map);
+
+	spin_lock(&bedata->socket_lock);
+	if (!list_empty(&map->list))
+		list_del_init(&map->list);
+	spin_unlock(&bedata->socket_lock);
+
+	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
+		gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
+	gnttab_end_foreign_access(map->active.ref, 0, 0);
+	free_page((unsigned long)map->active.ring);
 }
 
 static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
@@ -968,6 +981,91 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
 	return ret;
 }
 
+int pvcalls_front_release(struct socket *sock)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	int req_id, notify, ret;
+	struct xen_pvcalls_request *req;
+
+	if (sock->sk == NULL)
+		return 0;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -EIO;
+	}
+
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (map == NULL) {
+		pvcalls_exit();
+		return 0;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	sock->sk->sk_send_head = NULL;
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_RELEASE;
+	req->u.release.id = (uint64_t)map;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	if (map->active_socket) {
+		/*
+		 * Set in_error and wake up inflight_conn_req to force
+		 * recvmsg waiters to exit.
+		 */
+		map->active.ring->in_error = -EBADF;
+		wake_up_interruptible(&map->active.inflight_conn_req);
+
+		/*
+		 * Wait until there are no more waiters on the mutexes.
+		 * We know that no new waiters can be added because sk_send_head
+		 * is set to NULL -- we only need to wait for the existing
+		 * waiters to return.
+		 */
+		while (!mutex_trylock(&map->active.in_mutex) ||
+			   !mutex_trylock(&map->active.out_mutex))
+			cpu_relax();
+
+		pvcalls_front_free_map(bedata, map);
+		kfree(map);
+	} else {
+		spin_lock(&bedata->socket_lock);
+		if (READ_ONCE(map->passive.inflight_req_id) !=
+		    PVCALLS_INVALID_ID) {
+			pvcalls_front_free_map(bedata,
+					       map->passive.accept_map);
+			kfree(map->passive.accept_map);
+		}
+		list_del_init(&map->list);
+		kfree(map);
+		spin_unlock(&bedata->socket_lock);
+	}
+	WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
+
+	pvcalls_exit();
+	return 0;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 25e05b8..3332978 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -23,5 +23,6 @@ int pvcalls_front_recvmsg(struct socket *sock,
 unsigned int pvcalls_front_poll(struct file *file,
 				struct socket *sock,
 				poll_table *wait);
+int pvcalls_front_release(struct socket *sock);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 12/13] xen/pvcalls: implement release command
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
in_mutex and out_mutex to avoid concurrent accesses. Then, free the
socket.

For passive sockets, check whether we have already pre-allocated an
active socket for the purpose of being accepted. If so, free that as
well.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/pvcalls-front.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/pvcalls-front.h |  1 +
 2 files changed, 99 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index aca2b32..9beb34d 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -200,6 +200,19 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
 				   struct sock_mapping *map)
 {
+	int i;
+
+	unbind_from_irqhandler(map->active.irq, map);
+
+	spin_lock(&bedata->socket_lock);
+	if (!list_empty(&map->list))
+		list_del_init(&map->list);
+	spin_unlock(&bedata->socket_lock);
+
+	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
+		gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
+	gnttab_end_foreign_access(map->active.ref, 0, 0);
+	free_page((unsigned long)map->active.ring);
 }
 
 static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
@@ -968,6 +981,91 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
 	return ret;
 }
 
+int pvcalls_front_release(struct socket *sock)
+{
+	struct pvcalls_bedata *bedata;
+	struct sock_mapping *map;
+	int req_id, notify, ret;
+	struct xen_pvcalls_request *req;
+
+	if (sock->sk == NULL)
+		return 0;
+
+	pvcalls_enter();
+	if (!pvcalls_front_dev) {
+		pvcalls_exit();
+		return -EIO;
+	}
+
+	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
+
+	map = (struct sock_mapping *) sock->sk->sk_send_head;
+	if (map == NULL) {
+		pvcalls_exit();
+		return 0;
+	}
+
+	spin_lock(&bedata->socket_lock);
+	ret = get_request(bedata, &req_id);
+	if (ret < 0) {
+		spin_unlock(&bedata->socket_lock);
+		pvcalls_exit();
+		return ret;
+	}
+	sock->sk->sk_send_head = NULL;
+
+	req = RING_GET_REQUEST(&bedata->ring, req_id);
+	req->req_id = req_id;
+	req->cmd = PVCALLS_RELEASE;
+	req->u.release.id = (uint64_t)map;
+
+	bedata->ring.req_prod_pvt++;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
+	spin_unlock(&bedata->socket_lock);
+	if (notify)
+		notify_remote_via_irq(bedata->irq);
+
+	wait_event(bedata->inflight_req,
+		   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+	if (map->active_socket) {
+		/*
+		 * Set in_error and wake up inflight_conn_req to force
+		 * recvmsg waiters to exit.
+		 */
+		map->active.ring->in_error = -EBADF;
+		wake_up_interruptible(&map->active.inflight_conn_req);
+
+		/*
+		 * Wait until there are no more waiters on the mutexes.
+		 * We know that no new waiters can be added because sk_send_head
+		 * is set to NULL -- we only need to wait for the existing
+		 * waiters to return.
+		 */
+		while (!mutex_trylock(&map->active.in_mutex) ||
+			   !mutex_trylock(&map->active.out_mutex))
+			cpu_relax();
+
+		pvcalls_front_free_map(bedata, map);
+		kfree(map);
+	} else {
+		spin_lock(&bedata->socket_lock);
+		if (READ_ONCE(map->passive.inflight_req_id) !=
+		    PVCALLS_INVALID_ID) {
+			pvcalls_front_free_map(bedata,
+					       map->passive.accept_map);
+			kfree(map->passive.accept_map);
+		}
+		list_del_init(&map->list);
+		kfree(map);
+		spin_unlock(&bedata->socket_lock);
+	}
+	WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
+
+	pvcalls_exit();
+	return 0;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
 	{ "pvcalls" },
 	{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 25e05b8..3332978 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -23,5 +23,6 @@ int pvcalls_front_recvmsg(struct socket *sock,
 unsigned int pvcalls_front_poll(struct file *file,
 				struct socket *sock,
 				poll_table *wait);
+int pvcalls_front_release(struct socket *sock);
 
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 13/13] xen: introduce a Kconfig option to enable the pvcalls frontend
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-07  0:30     ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, sstabellini, jgross, boris.ostrovsky, Stefano Stabellini

Also add pvcalls-front to the Makefile.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/Kconfig  | 9 +++++++++
 drivers/xen/Makefile | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 4545561..0b2c828 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,15 @@ config XEN_PCIDEV_BACKEND
 
 	  If in doubt, say m.
 
+config XEN_PVCALLS_FRONTEND
+	tristate "XEN PV Calls frontend driver"
+	depends on INET && XEN
+	help
+	  Experimental frontend for the Xen PV Calls protocol
+	  (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+	  sends a small set of POSIX calls to the backend, which
+	  implements them.
+
 config XEN_PVCALLS_BACKEND
 	bool "XEN PV Calls backend driver"
 	depends on INET && XEN && XEN_BACKEND
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 480b928..afb9e03 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_XEN_EFI)			+= efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND)		+= xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)		+= xlate_mmu.o
 obj-$(CONFIG_XEN_PVCALLS_BACKEND)	+= pvcalls-back.o
+obj-$(CONFIG_XEN_PVCALLS_FRONTEND)	+= pvcalls-front.o
 xen-evtchn-y				:= evtchn.o
 xen-gntdev-y				:= gntdev.o
 xen-gntalloc-y				:= gntalloc.o
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 13/13] xen: introduce a Kconfig option to enable the pvcalls frontend
@ 2017-10-07  0:30     ` Stefano Stabellini
  0 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-07  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: jgross, Stefano Stabellini, boris.ostrovsky, sstabellini, linux-kernel

Also add pvcalls-front to the Makefile.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
---
 drivers/xen/Kconfig  | 9 +++++++++
 drivers/xen/Makefile | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 4545561..0b2c828 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,15 @@ config XEN_PCIDEV_BACKEND
 
 	  If in doubt, say m.
 
+config XEN_PVCALLS_FRONTEND
+	tristate "XEN PV Calls frontend driver"
+	depends on INET && XEN
+	help
+	  Experimental frontend for the Xen PV Calls protocol
+	  (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+	  sends a small set of POSIX calls to the backend, which
+	  implements them.
+
 config XEN_PVCALLS_BACKEND
 	bool "XEN PV Calls backend driver"
 	depends on INET && XEN && XEN_BACKEND
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 480b928..afb9e03 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_XEN_EFI)			+= efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND)		+= xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)		+= xlate_mmu.o
 obj-$(CONFIG_XEN_PVCALLS_BACKEND)	+= pvcalls-back.o
+obj-$(CONFIG_XEN_PVCALLS_FRONTEND)	+= pvcalls-front.o
 xen-evtchn-y				:= evtchn.o
 xen-gntdev-y				:= gntdev.o
 xen-gntalloc-y				:= gntalloc.o
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 02/13] xen/pvcalls: implement frontend disconnect
  2017-10-07  0:30   ` Stefano Stabellini
  2017-10-17 16:01     ` Boris Ostrovsky
@ 2017-10-17 16:01     ` Boris Ostrovsky
  2017-10-23 22:44       ` Stefano Stabellini
  2017-10-23 22:44       ` Stefano Stabellini
  1 sibling, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 16:01 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Introduce a data structure named pvcalls_bedata. It contains pointers to
> the command ring, the event channel, a list of active sockets and a list
> of passive sockets. Lists accesses are protected by a spin_lock.
>
> Introduce a waitqueue to allow waiting for a response on commands sent
> to the backend.
>
> Introduce an array of struct xen_pvcalls_response to store commands
> responses.
>
> pvcalls_refcount is used to keep count of the outstanding pvcalls users.
> Only remove connections once the refcount is zero.
>
> Implement pvcalls frontend removal function. Go through the list of
> active and passive sockets and free them all, one at a time.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index a8d38c2..d8b7a04 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -20,6 +20,46 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>  
> +#define PVCALLS_INVALID_ID UINT_MAX
> +#define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
> +
> +struct pvcalls_bedata {
> +	struct xen_pvcalls_front_ring ring;
> +	grant_ref_t ref;
> +	int irq;
> +
> +	struct list_head socket_mappings;
> +	struct list_head socketpass_mappings;
> +	spinlock_t socket_lock;
> +
> +	wait_queue_head_t inflight_req;
> +	struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];

Did you mean _REQ_ or _RSP_ in the macro name?

> +};
> +/* Only one front/back connection supported. */
> +static struct xenbus_device *pvcalls_front_dev;
> +static atomic_t pvcalls_refcount;
> +
> +/* first increment refcount, then proceed */
> +#define pvcalls_enter() {               \
> +	atomic_inc(&pvcalls_refcount);      \
> +}
> +
> +/* first complete other operations, then decrement refcount */
> +#define pvcalls_exit() {                \
> +	atomic_dec(&pvcalls_refcount);      \
> +}
> +
> +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> +{
> +	return IRQ_HANDLED;
> +}
> +
> +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
> +				   struct sock_mapping *map)
> +{
> +}
> +
>  static const struct xenbus_device_id pvcalls_front_ids[] = {
>  	{ "pvcalls" },
>  	{ "" }
> @@ -27,6 +67,33 @@
>  
>  static int pvcalls_front_remove(struct xenbus_device *dev)
>  {
> +	struct pvcalls_bedata *bedata;
> +	struct sock_mapping *map = NULL, *n;
> +
> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> +	dev_set_drvdata(&dev->dev, NULL);
> +	pvcalls_front_dev = NULL;
> +	if (bedata->irq >= 0)
> +		unbind_from_irqhandler(bedata->irq, dev);
> +
> +	smp_mb();
> +	while (atomic_read(&pvcalls_refcount) > 0)
> +		cpu_relax();
> +	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
> +		pvcalls_front_free_map(bedata, map);
> +		kfree(map);
> +	}
> +	list_for_each_entry_safe(map, n, &bedata->socketpass_mappings, list) {
> +		spin_lock(&bedata->socket_lock);
> +		list_del_init(&map->list);
> +		spin_unlock(&bedata->socket_lock);
> +		kfree(map);

Why do you re-init the entry if you are freeing it? And do you really
need the locks around it? This looks similar to the case we've discussed
for other patches --- if we are concerned that someone may grab this
entry then something must be wrong.

(Sorry, this must have been here in earlier versions but I only now
noticed it.)

-boris

> +	}
> +	if (bedata->ref >= 0)
> +		gnttab_end_foreign_access(bedata->ref, 0, 0);
> +	kfree(bedata->ring.sring);
> +	kfree(bedata);
> +	xenbus_switch_state(dev, XenbusStateClosed);
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 02/13] xen/pvcalls: implement frontend disconnect
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-17 16:01     ` Boris Ostrovsky
  2017-10-17 16:01     ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 16:01 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Introduce a data structure named pvcalls_bedata. It contains pointers to
> the command ring, the event channel, a list of active sockets and a list
> of passive sockets. Lists accesses are protected by a spin_lock.
>
> Introduce a waitqueue to allow waiting for a response on commands sent
> to the backend.
>
> Introduce an array of struct xen_pvcalls_response to store commands
> responses.
>
> pvcalls_refcount is used to keep count of the outstanding pvcalls users.
> Only remove connections once the refcount is zero.
>
> Implement pvcalls frontend removal function. Go through the list of
> active and passive sockets and free them all, one at a time.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index a8d38c2..d8b7a04 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -20,6 +20,46 @@
>  #include <xen/xenbus.h>
>  #include <xen/interface/io/pvcalls.h>
>  
> +#define PVCALLS_INVALID_ID UINT_MAX
> +#define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
> +
> +struct pvcalls_bedata {
> +	struct xen_pvcalls_front_ring ring;
> +	grant_ref_t ref;
> +	int irq;
> +
> +	struct list_head socket_mappings;
> +	struct list_head socketpass_mappings;
> +	spinlock_t socket_lock;
> +
> +	wait_queue_head_t inflight_req;
> +	struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];

Did you mean _REQ_ or _RSP_ in the macro name?

> +};
> +/* Only one front/back connection supported. */
> +static struct xenbus_device *pvcalls_front_dev;
> +static atomic_t pvcalls_refcount;
> +
> +/* first increment refcount, then proceed */
> +#define pvcalls_enter() {               \
> +	atomic_inc(&pvcalls_refcount);      \
> +}
> +
> +/* first complete other operations, then decrement refcount */
> +#define pvcalls_exit() {                \
> +	atomic_dec(&pvcalls_refcount);      \
> +}
> +
> +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> +{
> +	return IRQ_HANDLED;
> +}
> +
> +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
> +				   struct sock_mapping *map)
> +{
> +}
> +
>  static const struct xenbus_device_id pvcalls_front_ids[] = {
>  	{ "pvcalls" },
>  	{ "" }
> @@ -27,6 +67,33 @@
>  
>  static int pvcalls_front_remove(struct xenbus_device *dev)
>  {
> +	struct pvcalls_bedata *bedata;
> +	struct sock_mapping *map = NULL, *n;
> +
> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> +	dev_set_drvdata(&dev->dev, NULL);
> +	pvcalls_front_dev = NULL;
> +	if (bedata->irq >= 0)
> +		unbind_from_irqhandler(bedata->irq, dev);
> +
> +	smp_mb();
> +	while (atomic_read(&pvcalls_refcount) > 0)
> +		cpu_relax();
> +	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
> +		pvcalls_front_free_map(bedata, map);
> +		kfree(map);
> +	}
> +	list_for_each_entry_safe(map, n, &bedata->socketpass_mappings, list) {
> +		spin_lock(&bedata->socket_lock);
> +		list_del_init(&map->list);
> +		spin_unlock(&bedata->socket_lock);
> +		kfree(map);

Why do you re-init the entry if you are freeing it? And do you really
need the locks around it? This looks similar to the case we've discussed
for other patches --- if we are concerned that someone may grab this
entry then something must be wrong.

(Sorry, this must have been here in earlier versions but I only now
noticed it.)

-boris

> +	}
> +	if (bedata->ref >= 0)
> +		gnttab_end_foreign_access(bedata->ref, 0, 0);
> +	kfree(bedata->ring.sring);
> +	kfree(bedata);
> +	xenbus_switch_state(dev, XenbusStateClosed);
>  	return 0;
>  }
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
  (?)
@ 2017-10-17 16:59     ` Boris Ostrovsky
  2017-10-20  1:26       ` Stefano Stabellini
  2017-10-20  1:26       ` Stefano Stabellini
  -1 siblings, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 16:59 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Send a PVCALLS_SOCKET command to the backend, use the masked
> req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
> and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
> ready for the response, and there cannot be two outstanding responses
> with the same req_id.
>
> Wait for the response by waiting on the inflight_req waitqueue and
> check for the req_id field in rsp[req_id]. Use atomic accesses and
> barriers to read the field. Note that the barriers are simple smp
> barriers (as opposed to virt barriers) because they are for internal
> frontend synchronization, not frontend<->backend communication.
>
> Once a response is received, clear the corresponding rsp slot by setting
> req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid
> only from the frontend point of view. It is not part of the PVCalls
> protocol.
>
> pvcalls_front_event_handler is in charge of copying responses from the
> ring to the appropriate rsp slot. It is done by copying the body of the
> response first, then by copying req_id atomically. After the copies,
> wake up anybody waiting on waitqueue.
>
> socket_lock protects accesses to the ring.
>
> Create a new struct sock_mapping and convert the pointer into an
> uint64_t and use it as id for the new socket to pass to the backend. The
> struct will be fully initialized later on connect or bind. In this patch
> the struct sock_mapping is empty, the fields will be added by the next
> patch.
>
> sock->sk->sk_send_head is not used for ip sockets: reuse the field to
> store a pointer to the struct sock_mapping corresponding to the socket.
> This way, we can easily get the struct sock_mapping from the struct
> socket.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

with one question:

> +	/*
> +	 * PVCalls only supports domain AF_INET,
> +	 * type SOCK_STREAM and protocol 0 sockets for now.
> +	 *
> +	 * Check socket type here, AF_INET and protocol checks are done
> +	 * by the caller.
> +	 */
> +	if (sock->type != SOCK_STREAM)
> +		return -ENOTSUPP;
> +


Is this ENOTSUPP or EOPNOTSUPP? I didn't know the former even existed
and include/linux/errno.h suggests that this is NFSv3-specific.

-boris

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
@ 2017-10-17 16:59     ` Boris Ostrovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 16:59 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Send a PVCALLS_SOCKET command to the backend, use the masked
> req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
> and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
> ready for the response, and there cannot be two outstanding responses
> with the same req_id.
>
> Wait for the response by waiting on the inflight_req waitqueue and
> check for the req_id field in rsp[req_id]. Use atomic accesses and
> barriers to read the field. Note that the barriers are simple smp
> barriers (as opposed to virt barriers) because they are for internal
> frontend synchronization, not frontend<->backend communication.
>
> Once a response is received, clear the corresponding rsp slot by setting
> req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid
> only from the frontend point of view. It is not part of the PVCalls
> protocol.
>
> pvcalls_front_event_handler is in charge of copying responses from the
> ring to the appropriate rsp slot. It is done by copying the body of the
> response first, then by copying req_id atomically. After the copies,
> wake up anybody waiting on waitqueue.
>
> socket_lock protects accesses to the ring.
>
> Create a new struct sock_mapping and convert the pointer into an
> uint64_t and use it as id for the new socket to pass to the backend. The
> struct will be fully initialized later on connect or bind. In this patch
> the struct sock_mapping is empty, the fields will be added by the next
> patch.
>
> sock->sk->sk_send_head is not used for ip sockets: reuse the field to
> store a pointer to the struct sock_mapping corresponding to the socket.
> This way, we can easily get the struct sock_mapping from the struct
> socket.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

with one question:

> +	/*
> +	 * PVCalls only supports domain AF_INET,
> +	 * type SOCK_STREAM and protocol 0 sockets for now.
> +	 *
> +	 * Check socket type here, AF_INET and protocol checks are done
> +	 * by the caller.
> +	 */
> +	if (sock->type != SOCK_STREAM)
> +		return -ENOTSUPP;
> +


Is this ENOTSUPP or EOPNOTSUPP? I didn't know the former even existed
and include/linux/errno.h suggests that this is NFSv3-specific.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 06/13] xen/pvcalls: implement bind command
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
@ 2017-10-17 17:39     ` Boris Ostrovsky
  2017-10-20  1:31       ` Stefano Stabellini
  2017-10-20  1:31       ` Stefano Stabellini
  -1 siblings, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 17:39 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Send PVCALLS_BIND to the backend. Introduce a new structure, part of
> struct sock_mapping, to store information specific to passive sockets.
>
> Introduce a status field to keep track of the status of the passive
> socket.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/pvcalls-front.h |  3 +++
>  2 files changed, 69 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index 7c9261b..4cafd9b 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -71,6 +71,13 @@ struct sock_mapping {
>  
>  			wait_queue_head_t inflight_conn_req;
>  		} active;
> +		struct {
> +		/* Socket status */
> +#define PVCALLS_STATUS_UNINITALIZED  0
> +#define PVCALLS_STATUS_BIND          1
> +#define PVCALLS_STATUS_LISTEN        2
> +			uint8_t status;
> +		} passive;
>  	};
>  };
>  
> @@ -347,6 +354,65 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
>  	return ret;
>  }
>  
> +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> +{
> +	struct pvcalls_bedata *bedata;
> +	struct sock_mapping *map = NULL;
> +	struct xen_pvcalls_request *req;
> +	int notify, req_id, ret;
> +
> +	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
> +		return -ENOTSUPP;
> +
> +	pvcalls_enter();
> +	if (!pvcalls_front_dev) {
> +		pvcalls_exit();
> +		return -ENOTCONN;

The connect patch returns -ENETUNREACH here. Is there a deliberate
distinction between these cases?

Other than that

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 06/13] xen/pvcalls: implement bind command
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
  (?)
@ 2017-10-17 17:39     ` Boris Ostrovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 17:39 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Send PVCALLS_BIND to the backend. Introduce a new structure, part of
> struct sock_mapping, to store information specific to passive sockets.
>
> Introduce a status field to keep track of the status of the passive
> socket.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/pvcalls-front.h |  3 +++
>  2 files changed, 69 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index 7c9261b..4cafd9b 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -71,6 +71,13 @@ struct sock_mapping {
>  
>  			wait_queue_head_t inflight_conn_req;
>  		} active;
> +		struct {
> +		/* Socket status */
> +#define PVCALLS_STATUS_UNINITALIZED  0
> +#define PVCALLS_STATUS_BIND          1
> +#define PVCALLS_STATUS_LISTEN        2
> +			uint8_t status;
> +		} passive;
>  	};
>  };
>  
> @@ -347,6 +354,65 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
>  	return ret;
>  }
>  
> +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> +{
> +	struct pvcalls_bedata *bedata;
> +	struct sock_mapping *map = NULL;
> +	struct xen_pvcalls_request *req;
> +	int notify, req_id, ret;
> +
> +	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
> +		return -ENOTSUPP;
> +
> +	pvcalls_enter();
> +	if (!pvcalls_front_dev) {
> +		pvcalls_exit();
> +		return -ENOTCONN;

The connect patch returns -ENETUNREACH here. Is there a deliberate
distinction between these cases?

Other than that

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-07  0:30   ` [PATCH v5 08/13] xen/pvcalls: implement accept command Stefano Stabellini
  2017-10-17 18:34     ` Boris Ostrovsky
@ 2017-10-17 18:34     ` Boris Ostrovsky
  2017-10-23 23:03       ` Stefano Stabellini
  2017-10-23 23:03       ` Stefano Stabellini
  1 sibling, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 18:34 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Introduce a waitqueue to allow only one outstanding accept command at
> any given time and to implement polling on the passive socket. Introduce
> a flags field to keep track of in-flight accept and poll commands.
> 
> Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
> sure that only one accept command is executed at any given time by
> setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
> inflight_accept_req waitqueue.
> 
> Convert the new struct sock_mapping pointer into an uint64_t and use it
> as id for the new socket to pass to the backend.
> 
> Check if the accept call is non-blocking: in that case after sending the
> ACCEPT command to the backend store the sock_mapping pointer of the new
> struct and the inflight req_id then return -EAGAIN (which will respond
> only when there is something to accept). Next time accept is called,
> we'll check if the ACCEPT command has been answered, if so we'll pick up
> where we left off, otherwise we return -EAGAIN again.
> 
> Note that, differently from the other commands, we can use
> wait_event_interruptible (instead of wait_event) in the case of accept
> as we are able to track the req_id of the ACCEPT response that we are
> waiting.
> 
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/pvcalls-front.h |   3 +
>  2 files changed, 149 insertions(+)
> 
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index 5433fae..8958e74 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -77,6 +77,16 @@ struct sock_mapping {
>  #define PVCALLS_STATUS_BIND          1
>  #define PVCALLS_STATUS_LISTEN        2
>  			uint8_t status;
> +		/*
> +		 * Internal state-machine flags.
> +		 * Only one accept operation can be inflight for a socket.
> +		 * Only one poll operation can be inflight for a given socket.
> +		 */
> +#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
> +			uint8_t flags;
> +			uint32_t inflight_req_id;
> +			struct sock_mapping *accept_map;
> +			wait_queue_head_t inflight_accept_req;
>  		} passive;
>  	};
>  };
> @@ -392,6 +402,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>  	memcpy(req->u.bind.addr, addr, sizeof(*addr));
>  	req->u.bind.len = addr_len;
>  
> +	init_waitqueue_head(&map->passive.inflight_accept_req);
> +
>  	map->active_socket = false;
>  
>  	bedata->ring.req_prod_pvt++;
> @@ -470,6 +482,140 @@ int pvcalls_front_listen(struct socket *sock, int backlog)
>  	return ret;
>  }
>  
> +int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
> +{
> +	struct pvcalls_bedata *bedata;
> +	struct sock_mapping *map;
> +	struct sock_mapping *map2 = NULL;
> +	struct xen_pvcalls_request *req;
> +	int notify, req_id, ret, evtchn, nonblock;
> +
> +	pvcalls_enter();
> +	if (!pvcalls_front_dev) {
> +		pvcalls_exit();
> +		return -ENOTCONN;
> +	}
> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> +
> +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> +	if (!map) {
> +		pvcalls_exit();
> +		return -ENOTSOCK;
> +	}
> +
> +	if (map->passive.status != PVCALLS_STATUS_LISTEN) {
> +		pvcalls_exit();
> +		return -EINVAL;
> +	}
> +
> +	nonblock = flags & SOCK_NONBLOCK;
> +	/*
> +	 * Backend only supports 1 inflight accept request, will return
> +	 * errors for the others
> +	 */
> +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			     (void *)&map->passive.flags)) {
> +		req_id = READ_ONCE(map->passive.inflight_req_id);
> +		if (req_id != PVCALLS_INVALID_ID &&
> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {


READ_ONCE (especially the second one)? I know I may sound fixated on
this but I really don't understand how compiler may do anything wrong if
straight reads were used.

For the first case, I guess, theoretically the compiler may decide to
re-fetch map->passive.inflight_req_id. But even if it did, would that be
a problem? Both of these READ_ONCE targets are updated below before
PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
change between re-fetching, I think. (The only exception is the noblock
case, which does WRITE_ONCE that don't understand either)


> +			map2 = map->passive.accept_map;
> +			goto received;
> +		}
> +		if (nonblock) {
> +			pvcalls_exit();
> +			return -EAGAIN;
> +		}
> +		if (wait_event_interruptible(map->passive.inflight_accept_req,
> +			!test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +					  (void *)&map->passive.flags))) {
> +			pvcalls_exit();
> +			return -EINTR;
> +		}
> +	}
> +
> +	spin_lock(&bedata->socket_lock);
> +	ret = get_request(bedata, &req_id);
> +	if (ret < 0) {
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		spin_unlock(&bedata->socket_lock);
> +		pvcalls_exit();
> +		return ret;
> +	}
> +	map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
> +	if (map2 == NULL) {
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		spin_unlock(&bedata->socket_lock);
> +		pvcalls_exit();
> +		return -ENOMEM;
> +	}
> +	ret =  create_active(map2, &evtchn);
> +	if (ret < 0) {
> +		kfree(map2);
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		spin_unlock(&bedata->socket_lock);
> +		pvcalls_exit();
> +		return -ENOMEM;

Why not ret?

-boris


> +	}
> +	list_add_tail(&map2->list, &bedata->socket_mappings);
> +
> +	req = RING_GET_REQUEST(&bedata->ring, req_id);
> +	req->req_id = req_id;
> +	req->cmd = PVCALLS_ACCEPT;
> +	req->u.accept.id = (uint64_t) map;
> +	req->u.accept.ref = map2->active.ref;
> +	req->u.accept.id_new = (uint64_t) map2;
> +	req->u.accept.evtchn = evtchn;
> +	map->passive.accept_map = map2;
> +
> +	bedata->ring.req_prod_pvt++;
> +	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
> +	spin_unlock(&bedata->socket_lock);
> +	if (notify)
> +		notify_remote_via_irq(bedata->irq);
> +	/* We could check if we have received a response before returning. */
> +	if (nonblock) {
> +		WRITE_ONCE(map->passive.inflight_req_id, req_id);
> +		pvcalls_exit();
> +		return -EAGAIN;
> +	}
> +
> +	if (wait_event_interruptible(bedata->inflight_req,
> +		READ_ONCE(bedata->rsp[req_id].req_id) == req_id)) {
> +		pvcalls_exit();
> +		return -EINTR;
> +	}
> +	/* read req_id, then the content */
> +	smp_rmb();
> +
> +received:
> +	map2->sock = newsock;
> +	newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL);
> +	if (!newsock->sk) {
> +		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> +		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		pvcalls_front_free_map(bedata, map2);
> +		kfree(map2);
> +		pvcalls_exit();
> +		return -ENOMEM;
> +	}
> +	newsock->sk->sk_send_head = (void *)map2;
> +
> +	ret = bedata->rsp[req_id].ret;
> +	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> +	map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> +
> +	clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)&map->passive.flags);
> +	wake_up(&map->passive.inflight_accept_req);
> +
> +	pvcalls_exit();
> +	return ret;
> +}
> +
>  static const struct xenbus_device_id pvcalls_front_ids[] = {
>  	{ "pvcalls" },
>  	{ "" }
> diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
> index aa8fe10..ab4f1da 100644
> --- a/drivers/xen/pvcalls-front.h
> +++ b/drivers/xen/pvcalls-front.h
> @@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock,
>  		       struct sockaddr *addr,
>  		       int addr_len);
>  int pvcalls_front_listen(struct socket *sock, int backlog);
> +int pvcalls_front_accept(struct socket *sock,
> +			 struct socket *newsock,
> +			 int flags);
>  
>  #endif
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-07  0:30   ` [PATCH v5 08/13] xen/pvcalls: implement accept command Stefano Stabellini
@ 2017-10-17 18:34     ` Boris Ostrovsky
  2017-10-17 18:34     ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 18:34 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Introduce a waitqueue to allow only one outstanding accept command at
> any given time and to implement polling on the passive socket. Introduce
> a flags field to keep track of in-flight accept and poll commands.
> 
> Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
> sure that only one accept command is executed at any given time by
> setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
> inflight_accept_req waitqueue.
> 
> Convert the new struct sock_mapping pointer into an uint64_t and use it
> as id for the new socket to pass to the backend.
> 
> Check if the accept call is non-blocking: in that case after sending the
> ACCEPT command to the backend store the sock_mapping pointer of the new
> struct and the inflight req_id then return -EAGAIN (which will respond
> only when there is something to accept). Next time accept is called,
> we'll check if the ACCEPT command has been answered, if so we'll pick up
> where we left off, otherwise we return -EAGAIN again.
> 
> Note that, differently from the other commands, we can use
> wait_event_interruptible (instead of wait_event) in the case of accept
> as we are able to track the req_id of the ACCEPT response that we are
> waiting.
> 
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/pvcalls-front.h |   3 +
>  2 files changed, 149 insertions(+)
> 
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index 5433fae..8958e74 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -77,6 +77,16 @@ struct sock_mapping {
>  #define PVCALLS_STATUS_BIND          1
>  #define PVCALLS_STATUS_LISTEN        2
>  			uint8_t status;
> +		/*
> +		 * Internal state-machine flags.
> +		 * Only one accept operation can be inflight for a socket.
> +		 * Only one poll operation can be inflight for a given socket.
> +		 */
> +#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
> +			uint8_t flags;
> +			uint32_t inflight_req_id;
> +			struct sock_mapping *accept_map;
> +			wait_queue_head_t inflight_accept_req;
>  		} passive;
>  	};
>  };
> @@ -392,6 +402,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>  	memcpy(req->u.bind.addr, addr, sizeof(*addr));
>  	req->u.bind.len = addr_len;
>  
> +	init_waitqueue_head(&map->passive.inflight_accept_req);
> +
>  	map->active_socket = false;
>  
>  	bedata->ring.req_prod_pvt++;
> @@ -470,6 +482,140 @@ int pvcalls_front_listen(struct socket *sock, int backlog)
>  	return ret;
>  }
>  
> +int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
> +{
> +	struct pvcalls_bedata *bedata;
> +	struct sock_mapping *map;
> +	struct sock_mapping *map2 = NULL;
> +	struct xen_pvcalls_request *req;
> +	int notify, req_id, ret, evtchn, nonblock;
> +
> +	pvcalls_enter();
> +	if (!pvcalls_front_dev) {
> +		pvcalls_exit();
> +		return -ENOTCONN;
> +	}
> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> +
> +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> +	if (!map) {
> +		pvcalls_exit();
> +		return -ENOTSOCK;
> +	}
> +
> +	if (map->passive.status != PVCALLS_STATUS_LISTEN) {
> +		pvcalls_exit();
> +		return -EINVAL;
> +	}
> +
> +	nonblock = flags & SOCK_NONBLOCK;
> +	/*
> +	 * Backend only supports 1 inflight accept request, will return
> +	 * errors for the others
> +	 */
> +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			     (void *)&map->passive.flags)) {
> +		req_id = READ_ONCE(map->passive.inflight_req_id);
> +		if (req_id != PVCALLS_INVALID_ID &&
> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {


READ_ONCE (especially the second one)? I know I may sound fixated on
this but I really don't understand how compiler may do anything wrong if
straight reads were used.

For the first case, I guess, theoretically the compiler may decide to
re-fetch map->passive.inflight_req_id. But even if it did, would that be
a problem? Both of these READ_ONCE targets are updated below before
PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
change between re-fetching, I think. (The only exception is the noblock
case, which does WRITE_ONCE that don't understand either)


> +			map2 = map->passive.accept_map;
> +			goto received;
> +		}
> +		if (nonblock) {
> +			pvcalls_exit();
> +			return -EAGAIN;
> +		}
> +		if (wait_event_interruptible(map->passive.inflight_accept_req,
> +			!test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +					  (void *)&map->passive.flags))) {
> +			pvcalls_exit();
> +			return -EINTR;
> +		}
> +	}
> +
> +	spin_lock(&bedata->socket_lock);
> +	ret = get_request(bedata, &req_id);
> +	if (ret < 0) {
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		spin_unlock(&bedata->socket_lock);
> +		pvcalls_exit();
> +		return ret;
> +	}
> +	map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
> +	if (map2 == NULL) {
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		spin_unlock(&bedata->socket_lock);
> +		pvcalls_exit();
> +		return -ENOMEM;
> +	}
> +	ret =  create_active(map2, &evtchn);
> +	if (ret < 0) {
> +		kfree(map2);
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		spin_unlock(&bedata->socket_lock);
> +		pvcalls_exit();
> +		return -ENOMEM;

Why not ret?

-boris


> +	}
> +	list_add_tail(&map2->list, &bedata->socket_mappings);
> +
> +	req = RING_GET_REQUEST(&bedata->ring, req_id);
> +	req->req_id = req_id;
> +	req->cmd = PVCALLS_ACCEPT;
> +	req->u.accept.id = (uint64_t) map;
> +	req->u.accept.ref = map2->active.ref;
> +	req->u.accept.id_new = (uint64_t) map2;
> +	req->u.accept.evtchn = evtchn;
> +	map->passive.accept_map = map2;
> +
> +	bedata->ring.req_prod_pvt++;
> +	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
> +	spin_unlock(&bedata->socket_lock);
> +	if (notify)
> +		notify_remote_via_irq(bedata->irq);
> +	/* We could check if we have received a response before returning. */
> +	if (nonblock) {
> +		WRITE_ONCE(map->passive.inflight_req_id, req_id);
> +		pvcalls_exit();
> +		return -EAGAIN;
> +	}
> +
> +	if (wait_event_interruptible(bedata->inflight_req,
> +		READ_ONCE(bedata->rsp[req_id].req_id) == req_id)) {
> +		pvcalls_exit();
> +		return -EINTR;
> +	}
> +	/* read req_id, then the content */
> +	smp_rmb();
> +
> +received:
> +	map2->sock = newsock;
> +	newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL);
> +	if (!newsock->sk) {
> +		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> +		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +			  (void *)&map->passive.flags);
> +		pvcalls_front_free_map(bedata, map2);
> +		kfree(map2);
> +		pvcalls_exit();
> +		return -ENOMEM;
> +	}
> +	newsock->sk->sk_send_head = (void *)map2;
> +
> +	ret = bedata->rsp[req_id].ret;
> +	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> +	map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> +
> +	clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)&map->passive.flags);
> +	wake_up(&map->passive.inflight_accept_req);
> +
> +	pvcalls_exit();
> +	return ret;
> +}
> +
>  static const struct xenbus_device_id pvcalls_front_ids[] = {
>  	{ "pvcalls" },
>  	{ "" }
> diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
> index aa8fe10..ab4f1da 100644
> --- a/drivers/xen/pvcalls-front.h
> +++ b/drivers/xen/pvcalls-front.h
> @@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock,
>  		       struct sockaddr *addr,
>  		       int addr_len);
>  int pvcalls_front_listen(struct socket *sock, int backlog);
> +int pvcalls_front_accept(struct socket *sock,
> +			 struct socket *newsock,
> +			 int flags);
>  
>  #endif
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 09/13] xen/pvcalls: implement sendmsg
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
  (?)
@ 2017-10-17 21:06     ` Boris Ostrovsky
  2017-10-20  1:41       ` Stefano Stabellini
  2017-10-20  1:41       ` Stefano Stabellini
  -1 siblings, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 21:06 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini


> +static int __write_ring(struct pvcalls_data_intf *intf,
> +			struct pvcalls_data *data,
> +			struct iov_iter *msg_iter,
> +			int len)
> +{
> +	RING_IDX cons, prod, size, masked_prod, masked_cons;
> +	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> +	int32_t error;
> +
> +	error = intf->out_error;
> +	if (error < 0)
> +		return error;
> +	cons = intf->out_cons;
> +	prod = intf->out_prod;
> +	/* read indexes before continuing */
> +	virt_mb();
> +
> +	size = pvcalls_queued(prod, cons, array_size);
> +	if (size >= array_size)
> +		return 0;


I thought you were going to return an error here? If this can only be
due to someone messing up indexes is there a reason to continue trying
to write? What are the chances that the index will get corrected?

-boris

> +	if (len > array_size - size)
> +		len = array_size - size;
> +
> +	masked_prod = pvcalls_mask(prod, array_size);
> +	masked_cons = pvcalls_mask(cons, array_size);
> +
> +	if (masked_prod < masked_cons) {
> +		copy_from_iter(data->out + masked_prod, len, msg_iter);
> +	} else {
> +		if (len > array_size - masked_prod) {
> +			copy_from_iter(data->out + masked_prod,
> +				       array_size - masked_prod, msg_iter);
> +			copy_from_iter(data->out,
> +				       len - (array_size - masked_prod),
> +				       msg_iter);
> +		} else {
> +			copy_from_iter(data->out + masked_prod, len, msg_iter);
> +		}
> +	}
> +	/* write to ring before updating pointer */
> +	virt_wmb();
> +	intf->out_prod += len;
> +
> +	return len;
> +}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 09/13] xen/pvcalls: implement sendmsg
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
@ 2017-10-17 21:06     ` Boris Ostrovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 21:06 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel


> +static int __write_ring(struct pvcalls_data_intf *intf,
> +			struct pvcalls_data *data,
> +			struct iov_iter *msg_iter,
> +			int len)
> +{
> +	RING_IDX cons, prod, size, masked_prod, masked_cons;
> +	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> +	int32_t error;
> +
> +	error = intf->out_error;
> +	if (error < 0)
> +		return error;
> +	cons = intf->out_cons;
> +	prod = intf->out_prod;
> +	/* read indexes before continuing */
> +	virt_mb();
> +
> +	size = pvcalls_queued(prod, cons, array_size);
> +	if (size >= array_size)
> +		return 0;


I thought you were going to return an error here? If this can only be
due to someone messing up indexes is there a reason to continue trying
to write? What are the chances that the index will get corrected?

-boris

> +	if (len > array_size - size)
> +		len = array_size - size;
> +
> +	masked_prod = pvcalls_mask(prod, array_size);
> +	masked_cons = pvcalls_mask(cons, array_size);
> +
> +	if (masked_prod < masked_cons) {
> +		copy_from_iter(data->out + masked_prod, len, msg_iter);
> +	} else {
> +		if (len > array_size - masked_prod) {
> +			copy_from_iter(data->out + masked_prod,
> +				       array_size - masked_prod, msg_iter);
> +			copy_from_iter(data->out,
> +				       len - (array_size - masked_prod),
> +				       msg_iter);
> +		} else {
> +			copy_from_iter(data->out + masked_prod, len, msg_iter);
> +		}
> +	}
> +	/* write to ring before updating pointer */
> +	virt_wmb();
> +	intf->out_prod += len;
> +
> +	return len;
> +}


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-07  0:30   ` Stefano Stabellini
  2017-10-17 21:35     ` Boris Ostrovsky
@ 2017-10-17 21:35     ` Boris Ostrovsky
  2017-10-20  1:38       ` Stefano Stabellini
  2017-10-20  1:38       ` Stefano Stabellini
  1 sibling, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 21:35 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini


> +
> +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> +		     int flags)
> +{
> +	struct pvcalls_bedata *bedata;
> +	int ret;
> +	struct sock_mapping *map;
> +
> +	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
> +		return -EOPNOTSUPP;
> +
> +	pvcalls_enter();
> +	if (!pvcalls_front_dev) {
> +		pvcalls_exit();
> +		return -ENOTCONN;
> +	}
> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> +
> +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> +	if (!map) {
> +		pvcalls_exit();
> +		return -ENOTSOCK;
> +	}
> +
> +	mutex_lock(&map->active.in_mutex);
> +	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
> +		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> +
> +	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
> +		wait_event_interruptible(map->active.inflight_conn_req,
> +					 pvcalls_front_read_todo(map));
> +	}
> +	ret = __read_ring(map->active.ring, &map->active.data,
> +			  &msg->msg_iter, len, flags);
> +
> +	if (ret > 0)
> +		notify_remote_via_irq(map->active.irq);
> +	if (ret == 0)
> +		ret = -EAGAIN;

Why not 0? The manpage says:

       EAGAIN or EWOULDBLOCK
              The  socket  is  marked nonblocking and the receive
operation would block, or a receive timeout
              had been set and the timeout expired before data was
received.  POSIX.1 allows either error  to
              be  returned  for  this case, and does not require these
constants to have the same value, so a
              portable application should check for both possibilities.


I don't think either of these conditions is true here.

(Again, should have noticed this earlier, sorry)

-boris


> +	if (ret == -ENOTCONN)
> +		ret = 0;
> +
> +	mutex_unlock(&map->active.in_mutex);
> +	pvcalls_exit();
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-07  0:30   ` Stefano Stabellini
@ 2017-10-17 21:35     ` Boris Ostrovsky
  2017-10-17 21:35     ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 21:35 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel


> +
> +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> +		     int flags)
> +{
> +	struct pvcalls_bedata *bedata;
> +	int ret;
> +	struct sock_mapping *map;
> +
> +	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
> +		return -EOPNOTSUPP;
> +
> +	pvcalls_enter();
> +	if (!pvcalls_front_dev) {
> +		pvcalls_exit();
> +		return -ENOTCONN;
> +	}
> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> +
> +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> +	if (!map) {
> +		pvcalls_exit();
> +		return -ENOTSOCK;
> +	}
> +
> +	mutex_lock(&map->active.in_mutex);
> +	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
> +		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> +
> +	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
> +		wait_event_interruptible(map->active.inflight_conn_req,
> +					 pvcalls_front_read_todo(map));
> +	}
> +	ret = __read_ring(map->active.ring, &map->active.data,
> +			  &msg->msg_iter, len, flags);
> +
> +	if (ret > 0)
> +		notify_remote_via_irq(map->active.irq);
> +	if (ret == 0)
> +		ret = -EAGAIN;

Why not 0? The manpage says:

       EAGAIN or EWOULDBLOCK
              The  socket  is  marked nonblocking and the receive
operation would block, or a receive timeout
              had been set and the timeout expired before data was
received.  POSIX.1 allows either error  to
              be  returned  for  this case, and does not require these
constants to have the same value, so a
              portable application should check for both possibilities.


I don't think either of these conditions is true here.

(Again, should have noticed this earlier, sorry)

-boris


> +	if (ret == -ENOTCONN)
> +		ret = 0;
> +
> +	mutex_unlock(&map->active.in_mutex);
> +	pvcalls_exit();
> +	return ret;
> +}


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/13] xen/pvcalls: implement poll command
  2017-10-07  0:30     ` Stefano Stabellini
@ 2017-10-17 22:15       ` Boris Ostrovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 22:15 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini


>  
> +static unsigned int pvcalls_front_poll_passive(struct file *file,
> +					       struct pvcalls_bedata *bedata,
> +					       struct sock_mapping *map,
> +					       poll_table *wait)
> +{
> +	int notify, req_id, ret;
> +	struct xen_pvcalls_request *req;
> +
> +	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +		     (void *)&map->passive.flags)) {
> +		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
> +
> +		if (req_id != PVCALLS_INVALID_ID &&
> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
> +			return POLLIN | POLLRDNORM;


Same READ_ONCE() question as for an earlier patch.

-boris

> +
> +		poll_wait(file, &map->passive.inflight_accept_req, wait);
> +		return 0;
> +	}
> +

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/13] xen/pvcalls: implement poll command
@ 2017-10-17 22:15       ` Boris Ostrovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-17 22:15 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel


>  
> +static unsigned int pvcalls_front_poll_passive(struct file *file,
> +					       struct pvcalls_bedata *bedata,
> +					       struct sock_mapping *map,
> +					       poll_table *wait)
> +{
> +	int notify, req_id, ret;
> +	struct xen_pvcalls_request *req;
> +
> +	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> +		     (void *)&map->passive.flags)) {
> +		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
> +
> +		if (req_id != PVCALLS_INVALID_ID &&
> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
> +			return POLLIN | POLLRDNORM;


Same READ_ONCE() question as for an earlier patch.

-boris

> +
> +		poll_wait(file, &map->passive.inflight_accept_req, wait);
> +		return 0;
> +	}
> +


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
  2017-10-17 16:59     ` Boris Ostrovsky
  2017-10-20  1:26       ` Stefano Stabellini
@ 2017-10-20  1:26       ` Stefano Stabellini
  2017-10-20 14:24         ` Boris Ostrovsky
  2017-10-20 14:24         ` Boris Ostrovsky
  1 sibling, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Send a PVCALLS_SOCKET command to the backend, use the masked
> > req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
> > and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
> > ready for the response, and there cannot be two outstanding responses
> > with the same req_id.
> >
> > Wait for the response by waiting on the inflight_req waitqueue and
> > check for the req_id field in rsp[req_id]. Use atomic accesses and
> > barriers to read the field. Note that the barriers are simple smp
> > barriers (as opposed to virt barriers) because they are for internal
> > frontend synchronization, not frontend<->backend communication.
> >
> > Once a response is received, clear the corresponding rsp slot by setting
> > req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid
> > only from the frontend point of view. It is not part of the PVCalls
> > protocol.
> >
> > pvcalls_front_event_handler is in charge of copying responses from the
> > ring to the appropriate rsp slot. It is done by copying the body of the
> > response first, then by copying req_id atomically. After the copies,
> > wake up anybody waiting on waitqueue.
> >
> > socket_lock protects accesses to the ring.
> >
> > Create a new struct sock_mapping and convert the pointer into an
> > uint64_t and use it as id for the new socket to pass to the backend. The
> > struct will be fully initialized later on connect or bind. In this patch
> > the struct sock_mapping is empty, the fields will be added by the next
> > patch.
> >
> > sock->sk->sk_send_head is not used for ip sockets: reuse the field to
> > store a pointer to the struct sock_mapping corresponding to the socket.
> > This way, we can easily get the struct sock_mapping from the struct
> > socket.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> with one question:
> 
> > +	/*
> > +	 * PVCalls only supports domain AF_INET,
> > +	 * type SOCK_STREAM and protocol 0 sockets for now.
> > +	 *
> > +	 * Check socket type here, AF_INET and protocol checks are done
> > +	 * by the caller.
> > +	 */
> > +	if (sock->type != SOCK_STREAM)
> > +		return -ENOTSUPP;
> > +
> 
> 
> Is this ENOTSUPP or EOPNOTSUPP? I didn't know the former even existed
> and include/linux/errno.h suggests that this is NFSv3-specific.

The PVCalls spec says that unimplemented commands return ENOTSUPP,
defined as -524. I guess that is why I used ENOTSUPP, but, actually,
this is the return value to the caller, which has nothing to do with the
PVCalls protocol return value. In fact, it could be something entirely
different.

In this case, I think you are correct, it is best to use EOPNOTSUPP.
I'll make the change and retain your Reviewed-by, if that's OK for you.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
  2017-10-17 16:59     ` Boris Ostrovsky
@ 2017-10-20  1:26       ` Stefano Stabellini
  2017-10-20  1:26       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Send a PVCALLS_SOCKET command to the backend, use the masked
> > req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
> > and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
> > ready for the response, and there cannot be two outstanding responses
> > with the same req_id.
> >
> > Wait for the response by waiting on the inflight_req waitqueue and
> > check for the req_id field in rsp[req_id]. Use atomic accesses and
> > barriers to read the field. Note that the barriers are simple smp
> > barriers (as opposed to virt barriers) because they are for internal
> > frontend synchronization, not frontend<->backend communication.
> >
> > Once a response is received, clear the corresponding rsp slot by setting
> > req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid
> > only from the frontend point of view. It is not part of the PVCalls
> > protocol.
> >
> > pvcalls_front_event_handler is in charge of copying responses from the
> > ring to the appropriate rsp slot. It is done by copying the body of the
> > response first, then by copying req_id atomically. After the copies,
> > wake up anybody waiting on waitqueue.
> >
> > socket_lock protects accesses to the ring.
> >
> > Create a new struct sock_mapping and convert the pointer into an
> > uint64_t and use it as id for the new socket to pass to the backend. The
> > struct will be fully initialized later on connect or bind. In this patch
> > the struct sock_mapping is empty, the fields will be added by the next
> > patch.
> >
> > sock->sk->sk_send_head is not used for ip sockets: reuse the field to
> > store a pointer to the struct sock_mapping corresponding to the socket.
> > This way, we can easily get the struct sock_mapping from the struct
> > socket.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> with one question:
> 
> > +	/*
> > +	 * PVCalls only supports domain AF_INET,
> > +	 * type SOCK_STREAM and protocol 0 sockets for now.
> > +	 *
> > +	 * Check socket type here, AF_INET and protocol checks are done
> > +	 * by the caller.
> > +	 */
> > +	if (sock->type != SOCK_STREAM)
> > +		return -ENOTSUPP;
> > +
> 
> 
> Is this ENOTSUPP or EOPNOTSUPP? I didn't know the former even existed
> and include/linux/errno.h suggests that this is NFSv3-specific.

The PVCalls spec says that unimplemented commands return ENOTSUPP,
defined as -524. I guess that is why I used ENOTSUPP, but, actually,
this is the return value to the caller, which has nothing to do with the
PVCalls protocol return value. In fact, it could be something entirely
different.

In this case, I think you are correct, it is best to use EOPNOTSUPP.
I'll make the change and retain your Reviewed-by, if that's OK for you.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 06/13] xen/pvcalls: implement bind command
  2017-10-17 17:39     ` Boris Ostrovsky
  2017-10-20  1:31       ` Stefano Stabellini
@ 2017-10-20  1:31       ` Stefano Stabellini
  2017-10-20 14:40         ` Boris Ostrovsky
  2017-10-20 14:40         ` Boris Ostrovsky
  1 sibling, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Send PVCALLS_BIND to the backend. Introduce a new structure, part of
> > struct sock_mapping, to store information specific to passive sockets.
> >
> > Introduce a status field to keep track of the status of the passive
> > socket.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/xen/pvcalls-front.h |  3 +++
> >  2 files changed, 69 insertions(+)
> >
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index 7c9261b..4cafd9b 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -71,6 +71,13 @@ struct sock_mapping {
> >  
> >  			wait_queue_head_t inflight_conn_req;
> >  		} active;
> > +		struct {
> > +		/* Socket status */
> > +#define PVCALLS_STATUS_UNINITALIZED  0
> > +#define PVCALLS_STATUS_BIND          1
> > +#define PVCALLS_STATUS_LISTEN        2
> > +			uint8_t status;
> > +		} passive;
> >  	};
> >  };
> >  
> > @@ -347,6 +354,65 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
> >  	return ret;
> >  }
> >  
> > +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> > +{
> > +	struct pvcalls_bedata *bedata;
> > +	struct sock_mapping *map = NULL;
> > +	struct xen_pvcalls_request *req;
> > +	int notify, req_id, ret;
> > +
> > +	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
> > +		return -ENOTSUPP;
> > +
> > +	pvcalls_enter();
> > +	if (!pvcalls_front_dev) {
> > +		pvcalls_exit();
> > +		return -ENOTCONN;
> 
> The connect patch returns -ENETUNREACH here. Is there a deliberate
> distinction between these cases?

No, there isn't a deliberate distinction. Actually, all other commands
return ENOTCONN for this error, we might as well be consistent and
change ENETUNREACH to ENOTCONN for connect.

If you agree, I'll make the change to the connect patch, and add your
reviewed-by here.



> Other than that
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 06/13] xen/pvcalls: implement bind command
  2017-10-17 17:39     ` Boris Ostrovsky
@ 2017-10-20  1:31       ` Stefano Stabellini
  2017-10-20  1:31       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Send PVCALLS_BIND to the backend. Introduce a new structure, part of
> > struct sock_mapping, to store information specific to passive sockets.
> >
> > Introduce a status field to keep track of the status of the passive
> > socket.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/xen/pvcalls-front.h |  3 +++
> >  2 files changed, 69 insertions(+)
> >
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index 7c9261b..4cafd9b 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -71,6 +71,13 @@ struct sock_mapping {
> >  
> >  			wait_queue_head_t inflight_conn_req;
> >  		} active;
> > +		struct {
> > +		/* Socket status */
> > +#define PVCALLS_STATUS_UNINITALIZED  0
> > +#define PVCALLS_STATUS_BIND          1
> > +#define PVCALLS_STATUS_LISTEN        2
> > +			uint8_t status;
> > +		} passive;
> >  	};
> >  };
> >  
> > @@ -347,6 +354,65 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
> >  	return ret;
> >  }
> >  
> > +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> > +{
> > +	struct pvcalls_bedata *bedata;
> > +	struct sock_mapping *map = NULL;
> > +	struct xen_pvcalls_request *req;
> > +	int notify, req_id, ret;
> > +
> > +	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
> > +		return -ENOTSUPP;
> > +
> > +	pvcalls_enter();
> > +	if (!pvcalls_front_dev) {
> > +		pvcalls_exit();
> > +		return -ENOTCONN;
> 
> The connect patch returns -ENETUNREACH here. Is there a deliberate
> distinction between these cases?

No, there isn't a deliberate distinction. Actually, all other commands
return ENOTCONN for this error, we might as well be consistent and
change ENETUNREACH to ENOTCONN for connect.

If you agree, I'll make the change to the connect patch, and add your
reviewed-by here.



> Other than that
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-17 21:35     ` Boris Ostrovsky
  2017-10-20  1:38       ` Stefano Stabellini
@ 2017-10-20  1:38       ` Stefano Stabellini
  2017-10-20 14:43         ` Boris Ostrovsky
  2017-10-20 14:43         ` Boris Ostrovsky
  1 sibling, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:38 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> > +
> > +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> > +		     int flags)
> > +{
> > +	struct pvcalls_bedata *bedata;
> > +	int ret;
> > +	struct sock_mapping *map;
> > +
> > +	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
> > +		return -EOPNOTSUPP;
> > +
> > +	pvcalls_enter();
> > +	if (!pvcalls_front_dev) {
> > +		pvcalls_exit();
> > +		return -ENOTCONN;
> > +	}
> > +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> > +
> > +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> > +	if (!map) {
> > +		pvcalls_exit();
> > +		return -ENOTSOCK;
> > +	}
> > +
> > +	mutex_lock(&map->active.in_mutex);
> > +	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
> > +		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> > +
> > +	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
> > +		wait_event_interruptible(map->active.inflight_conn_req,
> > +					 pvcalls_front_read_todo(map));
> > +	}
> > +	ret = __read_ring(map->active.ring, &map->active.data,
> > +			  &msg->msg_iter, len, flags);
> > +
> > +	if (ret > 0)
> > +		notify_remote_via_irq(map->active.irq);
> > +	if (ret == 0)
> > +		ret = -EAGAIN;
> 
> Why not 0? The manpage says:
> 
>        EAGAIN or EWOULDBLOCK
>               The  socket  is  marked nonblocking and the receive
> operation would block, or a receive timeout
>               had been set and the timeout expired before data was
> received.  POSIX.1 allows either error  to
>               be  returned  for  this case, and does not require these
> constants to have the same value, so a
>               portable application should check for both possibilities.
> 
> 
> I don't think either of these conditions is true here.
> 
> (Again, should have noticed this earlier, sorry)

In case the socket is MSG_DONTWAIT, then we should return -EAGAIN here.
However, it is true that if the socket is not MSG_DONTWAIT, then
returning 0 would make more sense.

So I'll do:

if (ret == 0)
    ret = (flags & MSG_DONTWAIT) ? -EAGAIN : 0;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-17 21:35     ` Boris Ostrovsky
@ 2017-10-20  1:38       ` Stefano Stabellini
  2017-10-20  1:38       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:38 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> > +
> > +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> > +		     int flags)
> > +{
> > +	struct pvcalls_bedata *bedata;
> > +	int ret;
> > +	struct sock_mapping *map;
> > +
> > +	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
> > +		return -EOPNOTSUPP;
> > +
> > +	pvcalls_enter();
> > +	if (!pvcalls_front_dev) {
> > +		pvcalls_exit();
> > +		return -ENOTCONN;
> > +	}
> > +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> > +
> > +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> > +	if (!map) {
> > +		pvcalls_exit();
> > +		return -ENOTSOCK;
> > +	}
> > +
> > +	mutex_lock(&map->active.in_mutex);
> > +	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
> > +		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> > +
> > +	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
> > +		wait_event_interruptible(map->active.inflight_conn_req,
> > +					 pvcalls_front_read_todo(map));
> > +	}
> > +	ret = __read_ring(map->active.ring, &map->active.data,
> > +			  &msg->msg_iter, len, flags);
> > +
> > +	if (ret > 0)
> > +		notify_remote_via_irq(map->active.irq);
> > +	if (ret == 0)
> > +		ret = -EAGAIN;
> 
> Why not 0? The manpage says:
> 
>        EAGAIN or EWOULDBLOCK
>               The  socket  is  marked nonblocking and the receive
> operation would block, or a receive timeout
>               had been set and the timeout expired before data was
> received.  POSIX.1 allows either error  to
>               be  returned  for  this case, and does not require these
> constants to have the same value, so a
>               portable application should check for both possibilities.
> 
> 
> I don't think either of these conditions is true here.
> 
> (Again, should have noticed this earlier, sorry)

In case the socket is MSG_DONTWAIT, then we should return -EAGAIN here.
However, it is true that if the socket is not MSG_DONTWAIT, then
returning 0 would make more sense.

So I'll do:

if (ret == 0)
    ret = (flags & MSG_DONTWAIT) ? -EAGAIN : 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 09/13] xen/pvcalls: implement sendmsg
  2017-10-17 21:06     ` Boris Ostrovsky
  2017-10-20  1:41       ` Stefano Stabellini
@ 2017-10-20  1:41       ` Stefano Stabellini
  2017-10-20 14:44         ` Boris Ostrovsky
  2017-10-20 14:44         ` Boris Ostrovsky
  1 sibling, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:41 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> > +static int __write_ring(struct pvcalls_data_intf *intf,
> > +			struct pvcalls_data *data,
> > +			struct iov_iter *msg_iter,
> > +			int len)
> > +{
> > +	RING_IDX cons, prod, size, masked_prod, masked_cons;
> > +	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> > +	int32_t error;
> > +
> > +	error = intf->out_error;
> > +	if (error < 0)
> > +		return error;
> > +	cons = intf->out_cons;
> > +	prod = intf->out_prod;
> > +	/* read indexes before continuing */
> > +	virt_mb();
> > +
> > +	size = pvcalls_queued(prod, cons, array_size);
> > +	if (size >= array_size)
> > +		return 0;
> 
> 
> I thought you were going to return an error here? If this can only be
> due to someone messing up indexes is there a reason to continue trying
> to write? What are the chances that the index will get corrected?

Sorry, I forgot. I'll change it to return an error, maybe EFAULT.


> > +	if (len > array_size - size)
> > +		len = array_size - size;
> > +
> > +	masked_prod = pvcalls_mask(prod, array_size);
> > +	masked_cons = pvcalls_mask(cons, array_size);
> > +
> > +	if (masked_prod < masked_cons) {
> > +		copy_from_iter(data->out + masked_prod, len, msg_iter);
> > +	} else {
> > +		if (len > array_size - masked_prod) {
> > +			copy_from_iter(data->out + masked_prod,
> > +				       array_size - masked_prod, msg_iter);
> > +			copy_from_iter(data->out,
> > +				       len - (array_size - masked_prod),
> > +				       msg_iter);
> > +		} else {
> > +			copy_from_iter(data->out + masked_prod, len, msg_iter);
> > +		}
> > +	}
> > +	/* write to ring before updating pointer */
> > +	virt_wmb();
> > +	intf->out_prod += len;
> > +
> > +	return len;
> > +}
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 09/13] xen/pvcalls: implement sendmsg
  2017-10-17 21:06     ` Boris Ostrovsky
@ 2017-10-20  1:41       ` Stefano Stabellini
  2017-10-20  1:41       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-20  1:41 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> > +static int __write_ring(struct pvcalls_data_intf *intf,
> > +			struct pvcalls_data *data,
> > +			struct iov_iter *msg_iter,
> > +			int len)
> > +{
> > +	RING_IDX cons, prod, size, masked_prod, masked_cons;
> > +	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
> > +	int32_t error;
> > +
> > +	error = intf->out_error;
> > +	if (error < 0)
> > +		return error;
> > +	cons = intf->out_cons;
> > +	prod = intf->out_prod;
> > +	/* read indexes before continuing */
> > +	virt_mb();
> > +
> > +	size = pvcalls_queued(prod, cons, array_size);
> > +	if (size >= array_size)
> > +		return 0;
> 
> 
> I thought you were going to return an error here? If this can only be
> due to someone messing up indexes is there a reason to continue trying
> to write? What are the chances that the index will get corrected?

Sorry, I forgot. I'll change it to return an error, maybe EFAULT.


> > +	if (len > array_size - size)
> > +		len = array_size - size;
> > +
> > +	masked_prod = pvcalls_mask(prod, array_size);
> > +	masked_cons = pvcalls_mask(cons, array_size);
> > +
> > +	if (masked_prod < masked_cons) {
> > +		copy_from_iter(data->out + masked_prod, len, msg_iter);
> > +	} else {
> > +		if (len > array_size - masked_prod) {
> > +			copy_from_iter(data->out + masked_prod,
> > +				       array_size - masked_prod, msg_iter);
> > +			copy_from_iter(data->out,
> > +				       len - (array_size - masked_prod),
> > +				       msg_iter);
> > +		} else {
> > +			copy_from_iter(data->out + masked_prod, len, msg_iter);
> > +		}
> > +	}
> > +	/* write to ring before updating pointer */
> > +	virt_wmb();
> > +	intf->out_prod += len;
> > +
> > +	return len;
> > +}
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
  2017-10-20  1:26       ` Stefano Stabellini
  2017-10-20 14:24         ` Boris Ostrovsky
@ 2017-10-20 14:24         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:24 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini

On 10/19/2017 09:26 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:

>> with one question:
>>
>>> +	/*
>>> +	 * PVCalls only supports domain AF_INET,
>>> +	 * type SOCK_STREAM and protocol 0 sockets for now.
>>> +	 *
>>> +	 * Check socket type here, AF_INET and protocol checks are done
>>> +	 * by the caller.
>>> +	 */
>>> +	if (sock->type != SOCK_STREAM)
>>> +		return -ENOTSUPP;
>>> +
>>
>>
>> Is this ENOTSUPP or EOPNOTSUPP? I didn't know the former even existed
>> and include/linux/errno.h suggests that this is NFSv3-specific.
> 
> The PVCalls spec says that unimplemented commands return ENOTSUPP,
> defined as -524. I guess that is why I used ENOTSUPP, but, actually,
> this is the return value to the caller, which has nothing to do with the
> PVCalls protocol return value. In fact, it could be something entirely
> different.
> 
> In this case, I think you are correct, it is best to use EOPNOTSUPP.
> I'll make the change and retain your Reviewed-by, if that's OK for you.
> 

Of course.

This all is somewhat convoluted:

man errno:

ENOTSUP      Operation not supported (POSIX.1)
EOPNOTSUPP   Operation not supported on socket (POSIX.1)
             (ENOTSUP  and EOPNOTSUPP have the same value on Linux, but
              according to POSIX.1 these error values should be
              distinct.)

/usr/include/bits/errno.h:
/* Linux has no ENOTSUP error code.  */
# define ENOTSUP EOPNOTSUPP


Linux kernel:
include/linux/errno.h:
/* Defined for the NFSv3 protocol */
...
#define ENOTSUPP        524     /* Operation is not supported */


include/uapi/asm-generic/errno.h:
#define EOPNOTSUPP      95      /* Operation not supported on transport
                                   endpoint */


ENOTSUP is not generally defined in Linux kernel.


Clear as mud.


-boris

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events
  2017-10-20  1:26       ` Stefano Stabellini
@ 2017-10-20 14:24         ` Boris Ostrovsky
  2017-10-20 14:24         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:24 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel

On 10/19/2017 09:26 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:

>> with one question:
>>
>>> +	/*
>>> +	 * PVCalls only supports domain AF_INET,
>>> +	 * type SOCK_STREAM and protocol 0 sockets for now.
>>> +	 *
>>> +	 * Check socket type here, AF_INET and protocol checks are done
>>> +	 * by the caller.
>>> +	 */
>>> +	if (sock->type != SOCK_STREAM)
>>> +		return -ENOTSUPP;
>>> +
>>
>>
>> Is this ENOTSUPP or EOPNOTSUPP? I didn't know the former even existed
>> and include/linux/errno.h suggests that this is NFSv3-specific.
> 
> The PVCalls spec says that unimplemented commands return ENOTSUPP,
> defined as -524. I guess that is why I used ENOTSUPP, but, actually,
> this is the return value to the caller, which has nothing to do with the
> PVCalls protocol return value. In fact, it could be something entirely
> different.
> 
> In this case, I think you are correct, it is best to use EOPNOTSUPP.
> I'll make the change and retain your Reviewed-by, if that's OK for you.
> 

Of course.

This all is somewhat convoluted:

man errno:

ENOTSUP      Operation not supported (POSIX.1)
EOPNOTSUPP   Operation not supported on socket (POSIX.1)
             (ENOTSUP  and EOPNOTSUPP have the same value on Linux, but
              according to POSIX.1 these error values should be
              distinct.)

/usr/include/bits/errno.h:
/* Linux has no ENOTSUP error code.  */
# define ENOTSUP EOPNOTSUPP


Linux kernel:
include/linux/errno.h:
/* Defined for the NFSv3 protocol */
...
#define ENOTSUPP        524     /* Operation is not supported */


include/uapi/asm-generic/errno.h:
#define EOPNOTSUPP      95      /* Operation not supported on transport
                                   endpoint */


ENOTSUP is not generally defined in Linux kernel.


Clear as mud.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 06/13] xen/pvcalls: implement bind command
  2017-10-20  1:31       ` Stefano Stabellini
  2017-10-20 14:40         ` Boris Ostrovsky
@ 2017-10-20 14:40         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:40 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini

On 10/19/2017 09:31 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
>>> +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>>> +{
>>> +	struct pvcalls_bedata *bedata;
>>> +	struct sock_mapping *map = NULL;
>>> +	struct xen_pvcalls_request *req;
>>> +	int notify, req_id, ret;
>>> +
>>> +	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
>>> +		return -ENOTSUPP;
>>> +
>>> +	pvcalls_enter();
>>> +	if (!pvcalls_front_dev) {
>>> +		pvcalls_exit();
>>> +		return -ENOTCONN;
>> The connect patch returns -ENETUNREACH here. Is there a deliberate
>> distinction between these cases?
> No, there isn't a deliberate distinction. Actually, all other commands
> return ENOTCONN for this error, we might as well be consistent and
> change ENETUNREACH to ENOTCONN for connect.
>
> If you agree, I'll make the change to the connect patch, and add your
> reviewed-by here.

It's already there ;-)

-boris


>
>
>
>> Other than that
>>
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 06/13] xen/pvcalls: implement bind command
  2017-10-20  1:31       ` Stefano Stabellini
@ 2017-10-20 14:40         ` Boris Ostrovsky
  2017-10-20 14:40         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:40 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel

On 10/19/2017 09:31 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
>>> +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>>> +{
>>> +	struct pvcalls_bedata *bedata;
>>> +	struct sock_mapping *map = NULL;
>>> +	struct xen_pvcalls_request *req;
>>> +	int notify, req_id, ret;
>>> +
>>> +	if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
>>> +		return -ENOTSUPP;
>>> +
>>> +	pvcalls_enter();
>>> +	if (!pvcalls_front_dev) {
>>> +		pvcalls_exit();
>>> +		return -ENOTCONN;
>> The connect patch returns -ENETUNREACH here. Is there a deliberate
>> distinction between these cases?
> No, there isn't a deliberate distinction. Actually, all other commands
> return ENOTCONN for this error, we might as well be consistent and
> change ENETUNREACH to ENOTCONN for connect.
>
> If you agree, I'll make the change to the connect patch, and add your
> reviewed-by here.

It's already there ;-)

-boris


>
>
>
>> Other than that
>>
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-20  1:38       ` Stefano Stabellini
@ 2017-10-20 14:43         ` Boris Ostrovsky
  2017-10-20 14:43         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini

On 10/19/2017 09:38 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>>> +
>>> +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>>> +		     int flags)
>>> +{
>>> +	struct pvcalls_bedata *bedata;
>>> +	int ret;
>>> +	struct sock_mapping *map;
>>> +
>>> +	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	pvcalls_enter();
>>> +	if (!pvcalls_front_dev) {
>>> +		pvcalls_exit();
>>> +		return -ENOTCONN;
>>> +	}
>>> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
>>> +
>>> +	map = (struct sock_mapping *) sock->sk->sk_send_head;
>>> +	if (!map) {
>>> +		pvcalls_exit();
>>> +		return -ENOTSOCK;
>>> +	}
>>> +
>>> +	mutex_lock(&map->active.in_mutex);
>>> +	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
>>> +		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
>>> +
>>> +	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
>>> +		wait_event_interruptible(map->active.inflight_conn_req,
>>> +					 pvcalls_front_read_todo(map));
>>> +	}
>>> +	ret = __read_ring(map->active.ring, &map->active.data,
>>> +			  &msg->msg_iter, len, flags);
>>> +
>>> +	if (ret > 0)
>>> +		notify_remote_via_irq(map->active.irq);
>>> +	if (ret == 0)
>>> +		ret = -EAGAIN;
>> Why not 0? The manpage says:
>>
>>        EAGAIN or EWOULDBLOCK
>>               The  socket  is  marked nonblocking and the receive
>> operation would block, or a receive timeout
>>               had been set and the timeout expired before data was
>> received.  POSIX.1 allows either error  to
>>               be  returned  for  this case, and does not require these
>> constants to have the same value, so a
>>               portable application should check for both possibilities.
>>
>>
>> I don't think either of these conditions is true here.
>>
>> (Again, should have noticed this earlier, sorry)
> In case the socket is MSG_DONTWAIT, then we should return -EAGAIN here.
> However, it is true that if the socket is not MSG_DONTWAIT, then
> returning 0 would make more sense.
>
> So I'll do:
>
> if (ret == 0)
>     ret = (flags & MSG_DONTWAIT) ? -EAGAIN : 0;

Sure. With that

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/13] xen/pvcalls: implement recvmsg
  2017-10-20  1:38       ` Stefano Stabellini
  2017-10-20 14:43         ` Boris Ostrovsky
@ 2017-10-20 14:43         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel

On 10/19/2017 09:38 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>>> +
>>> +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>>> +		     int flags)
>>> +{
>>> +	struct pvcalls_bedata *bedata;
>>> +	int ret;
>>> +	struct sock_mapping *map;
>>> +
>>> +	if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	pvcalls_enter();
>>> +	if (!pvcalls_front_dev) {
>>> +		pvcalls_exit();
>>> +		return -ENOTCONN;
>>> +	}
>>> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
>>> +
>>> +	map = (struct sock_mapping *) sock->sk->sk_send_head;
>>> +	if (!map) {
>>> +		pvcalls_exit();
>>> +		return -ENOTSOCK;
>>> +	}
>>> +
>>> +	mutex_lock(&map->active.in_mutex);
>>> +	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
>>> +		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
>>> +
>>> +	while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
>>> +		wait_event_interruptible(map->active.inflight_conn_req,
>>> +					 pvcalls_front_read_todo(map));
>>> +	}
>>> +	ret = __read_ring(map->active.ring, &map->active.data,
>>> +			  &msg->msg_iter, len, flags);
>>> +
>>> +	if (ret > 0)
>>> +		notify_remote_via_irq(map->active.irq);
>>> +	if (ret == 0)
>>> +		ret = -EAGAIN;
>> Why not 0? The manpage says:
>>
>>        EAGAIN or EWOULDBLOCK
>>               The  socket  is  marked nonblocking and the receive
>> operation would block, or a receive timeout
>>               had been set and the timeout expired before data was
>> received.  POSIX.1 allows either error  to
>>               be  returned  for  this case, and does not require these
>> constants to have the same value, so a
>>               portable application should check for both possibilities.
>>
>>
>> I don't think either of these conditions is true here.
>>
>> (Again, should have noticed this earlier, sorry)
> In case the socket is MSG_DONTWAIT, then we should return -EAGAIN here.
> However, it is true that if the socket is not MSG_DONTWAIT, then
> returning 0 would make more sense.
>
> So I'll do:
>
> if (ret == 0)
>     ret = (flags & MSG_DONTWAIT) ? -EAGAIN : 0;

Sure. With that

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 09/13] xen/pvcalls: implement sendmsg
  2017-10-20  1:41       ` Stefano Stabellini
  2017-10-20 14:44         ` Boris Ostrovsky
@ 2017-10-20 14:44         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:44 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini

On 10/19/2017 09:41 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>>> +static int __write_ring(struct pvcalls_data_intf *intf,
>>> +			struct pvcalls_data *data,
>>> +			struct iov_iter *msg_iter,
>>> +			int len)
>>> +{
>>> +	RING_IDX cons, prod, size, masked_prod, masked_cons;
>>> +	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
>>> +	int32_t error;
>>> +
>>> +	error = intf->out_error;
>>> +	if (error < 0)
>>> +		return error;
>>> +	cons = intf->out_cons;
>>> +	prod = intf->out_prod;
>>> +	/* read indexes before continuing */
>>> +	virt_mb();
>>> +
>>> +	size = pvcalls_queued(prod, cons, array_size);
>>> +	if (size >= array_size)
>>> +		return 0;
>>
>> I thought you were going to return an error here? If this can only be
>> due to someone messing up indexes is there a reason to continue trying
>> to write? What are the chances that the index will get corrected?
> Sorry, I forgot. I'll change it to return an error, maybe EFAULT.

I think EINVAL might be more appropriate. But either way you can tack on
my R-b to the patch.

-boris

>
>
>>> +	if (len > array_size - size)
>>> +		len = array_size - size;
>>> +
>>> +	masked_prod = pvcalls_mask(prod, array_size);
>>> +	masked_cons = pvcalls_mask(cons, array_size);
>>> +
>>> +	if (masked_prod < masked_cons) {
>>> +		copy_from_iter(data->out + masked_prod, len, msg_iter);
>>> +	} else {
>>> +		if (len > array_size - masked_prod) {
>>> +			copy_from_iter(data->out + masked_prod,
>>> +				       array_size - masked_prod, msg_iter);
>>> +			copy_from_iter(data->out,
>>> +				       len - (array_size - masked_prod),
>>> +				       msg_iter);
>>> +		} else {
>>> +			copy_from_iter(data->out + masked_prod, len, msg_iter);
>>> +		}
>>> +	}
>>> +	/* write to ring before updating pointer */
>>> +	virt_wmb();
>>> +	intf->out_prod += len;
>>> +
>>> +	return len;
>>> +}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 09/13] xen/pvcalls: implement sendmsg
  2017-10-20  1:41       ` Stefano Stabellini
@ 2017-10-20 14:44         ` Boris Ostrovsky
  2017-10-20 14:44         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-20 14:44 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel

On 10/19/2017 09:41 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>>> +static int __write_ring(struct pvcalls_data_intf *intf,
>>> +			struct pvcalls_data *data,
>>> +			struct iov_iter *msg_iter,
>>> +			int len)
>>> +{
>>> +	RING_IDX cons, prod, size, masked_prod, masked_cons;
>>> +	RING_IDX array_size = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
>>> +	int32_t error;
>>> +
>>> +	error = intf->out_error;
>>> +	if (error < 0)
>>> +		return error;
>>> +	cons = intf->out_cons;
>>> +	prod = intf->out_prod;
>>> +	/* read indexes before continuing */
>>> +	virt_mb();
>>> +
>>> +	size = pvcalls_queued(prod, cons, array_size);
>>> +	if (size >= array_size)
>>> +		return 0;
>>
>> I thought you were going to return an error here? If this can only be
>> due to someone messing up indexes is there a reason to continue trying
>> to write? What are the chances that the index will get corrected?
> Sorry, I forgot. I'll change it to return an error, maybe EFAULT.

I think EINVAL might be more appropriate. But either way you can tack on
my R-b to the patch.

-boris

>
>
>>> +	if (len > array_size - size)
>>> +		len = array_size - size;
>>> +
>>> +	masked_prod = pvcalls_mask(prod, array_size);
>>> +	masked_cons = pvcalls_mask(cons, array_size);
>>> +
>>> +	if (masked_prod < masked_cons) {
>>> +		copy_from_iter(data->out + masked_prod, len, msg_iter);
>>> +	} else {
>>> +		if (len > array_size - masked_prod) {
>>> +			copy_from_iter(data->out + masked_prod,
>>> +				       array_size - masked_prod, msg_iter);
>>> +			copy_from_iter(data->out,
>>> +				       len - (array_size - masked_prod),
>>> +				       msg_iter);
>>> +		} else {
>>> +			copy_from_iter(data->out + masked_prod, len, msg_iter);
>>> +		}
>>> +	}
>>> +	/* write to ring before updating pointer */
>>> +	virt_wmb();
>>> +	intf->out_prod += len;
>>> +
>>> +	return len;
>>> +}


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 02/13] xen/pvcalls: implement frontend disconnect
  2017-10-17 16:01     ` Boris Ostrovsky
  2017-10-23 22:44       ` Stefano Stabellini
@ 2017-10-23 22:44       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-23 22:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Introduce a data structure named pvcalls_bedata. It contains pointers to
> > the command ring, the event channel, a list of active sockets and a list
> > of passive sockets. Lists accesses are protected by a spin_lock.
> >
> > Introduce a waitqueue to allow waiting for a response on commands sent
> > to the backend.
> >
> > Introduce an array of struct xen_pvcalls_response to store commands
> > responses.
> >
> > pvcalls_refcount is used to keep count of the outstanding pvcalls users.
> > Only remove connections once the refcount is zero.
> >
> > Implement pvcalls frontend removal function. Go through the list of
> > active and passive sockets and free them all, one at a time.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 67 insertions(+)
> >
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index a8d38c2..d8b7a04 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -20,6 +20,46 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> >  
> > +#define PVCALLS_INVALID_ID UINT_MAX
> > +#define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> > +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
> > +
> > +struct pvcalls_bedata {
> > +	struct xen_pvcalls_front_ring ring;
> > +	grant_ref_t ref;
> > +	int irq;
> > +
> > +	struct list_head socket_mappings;
> > +	struct list_head socketpass_mappings;
> > +	spinlock_t socket_lock;
> > +
> > +	wait_queue_head_t inflight_req;
> > +	struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];
> 
> Did you mean _REQ_ or _RSP_ in the macro name?

For each request there is one response, so it doesn't make a difference.
But for clarity, I will rename.


> > +};
> > +/* Only one front/back connection supported. */
> > +static struct xenbus_device *pvcalls_front_dev;
> > +static atomic_t pvcalls_refcount;
> > +
> > +/* first increment refcount, then proceed */
> > +#define pvcalls_enter() {               \
> > +	atomic_inc(&pvcalls_refcount);      \
> > +}
> > +
> > +/* first complete other operations, then decrement refcount */
> > +#define pvcalls_exit() {                \
> > +	atomic_dec(&pvcalls_refcount);      \
> > +}
> > +
> > +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> > +{
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
> > +				   struct sock_mapping *map)
> > +{
> > +}
> > +
> >  static const struct xenbus_device_id pvcalls_front_ids[] = {
> >  	{ "pvcalls" },
> >  	{ "" }
> > @@ -27,6 +67,33 @@
> >  
> >  static int pvcalls_front_remove(struct xenbus_device *dev)
> >  {
> > +	struct pvcalls_bedata *bedata;
> > +	struct sock_mapping *map = NULL, *n;
> > +
> > +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> > +	dev_set_drvdata(&dev->dev, NULL);
> > +	pvcalls_front_dev = NULL;
> > +	if (bedata->irq >= 0)
> > +		unbind_from_irqhandler(bedata->irq, dev);
> > +
> > +	smp_mb();
> > +	while (atomic_read(&pvcalls_refcount) > 0)
> > +		cpu_relax();
> > +	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
> > +		pvcalls_front_free_map(bedata, map);
> > +		kfree(map);
> > +	}
> > +	list_for_each_entry_safe(map, n, &bedata->socketpass_mappings, list) {
> > +		spin_lock(&bedata->socket_lock);
> > +		list_del_init(&map->list);
> > +		spin_unlock(&bedata->socket_lock);
> > +		kfree(map);
> 
> Why do you re-init the entry if you are freeing it?

Fair enough, I'll just list_del.


> And do you really
> need the locks around it? This looks similar to the case we've discussed
> for other patches --- if we are concerned that someone may grab this
> entry then something must be wrong.
> 
> (Sorry, this must have been here in earlier versions but I only now
> noticed it.)

Yes, you are right, it is already protected by the global refcount, I'll
remove.


> > +	}
> > +	if (bedata->ref >= 0)
> > +		gnttab_end_foreign_access(bedata->ref, 0, 0);
> > +	kfree(bedata->ring.sring);
> > +	kfree(bedata);
> > +	xenbus_switch_state(dev, XenbusStateClosed);
> >  	return 0;
> >  }
> >  
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 02/13] xen/pvcalls: implement frontend disconnect
  2017-10-17 16:01     ` Boris Ostrovsky
@ 2017-10-23 22:44       ` Stefano Stabellini
  2017-10-23 22:44       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-23 22:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Introduce a data structure named pvcalls_bedata. It contains pointers to
> > the command ring, the event channel, a list of active sockets and a list
> > of passive sockets. Lists accesses are protected by a spin_lock.
> >
> > Introduce a waitqueue to allow waiting for a response on commands sent
> > to the backend.
> >
> > Introduce an array of struct xen_pvcalls_response to store commands
> > responses.
> >
> > pvcalls_refcount is used to keep count of the outstanding pvcalls users.
> > Only remove connections once the refcount is zero.
> >
> > Implement pvcalls frontend removal function. Go through the list of
> > active and passive sockets and free them all, one at a time.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 67 insertions(+)
> >
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index a8d38c2..d8b7a04 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -20,6 +20,46 @@
> >  #include <xen/xenbus.h>
> >  #include <xen/interface/io/pvcalls.h>
> >  
> > +#define PVCALLS_INVALID_ID UINT_MAX
> > +#define PVCALLS_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> > +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
> > +
> > +struct pvcalls_bedata {
> > +	struct xen_pvcalls_front_ring ring;
> > +	grant_ref_t ref;
> > +	int irq;
> > +
> > +	struct list_head socket_mappings;
> > +	struct list_head socketpass_mappings;
> > +	spinlock_t socket_lock;
> > +
> > +	wait_queue_head_t inflight_req;
> > +	struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];
> 
> Did you mean _REQ_ or _RSP_ in the macro name?

For each request there is one response, so it doesn't make a difference.
But for clarity, I will rename.


> > +};
> > +/* Only one front/back connection supported. */
> > +static struct xenbus_device *pvcalls_front_dev;
> > +static atomic_t pvcalls_refcount;
> > +
> > +/* first increment refcount, then proceed */
> > +#define pvcalls_enter() {               \
> > +	atomic_inc(&pvcalls_refcount);      \
> > +}
> > +
> > +/* first complete other operations, then decrement refcount */
> > +#define pvcalls_exit() {                \
> > +	atomic_dec(&pvcalls_refcount);      \
> > +}
> > +
> > +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> > +{
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
> > +				   struct sock_mapping *map)
> > +{
> > +}
> > +
> >  static const struct xenbus_device_id pvcalls_front_ids[] = {
> >  	{ "pvcalls" },
> >  	{ "" }
> > @@ -27,6 +67,33 @@
> >  
> >  static int pvcalls_front_remove(struct xenbus_device *dev)
> >  {
> > +	struct pvcalls_bedata *bedata;
> > +	struct sock_mapping *map = NULL, *n;
> > +
> > +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> > +	dev_set_drvdata(&dev->dev, NULL);
> > +	pvcalls_front_dev = NULL;
> > +	if (bedata->irq >= 0)
> > +		unbind_from_irqhandler(bedata->irq, dev);
> > +
> > +	smp_mb();
> > +	while (atomic_read(&pvcalls_refcount) > 0)
> > +		cpu_relax();
> > +	list_for_each_entry_safe(map, n, &bedata->socket_mappings, list) {
> > +		pvcalls_front_free_map(bedata, map);
> > +		kfree(map);
> > +	}
> > +	list_for_each_entry_safe(map, n, &bedata->socketpass_mappings, list) {
> > +		spin_lock(&bedata->socket_lock);
> > +		list_del_init(&map->list);
> > +		spin_unlock(&bedata->socket_lock);
> > +		kfree(map);
> 
> Why do you re-init the entry if you are freeing it?

Fair enough, I'll just list_del.


> And do you really
> need the locks around it? This looks similar to the case we've discussed
> for other patches --- if we are concerned that someone may grab this
> entry then something must be wrong.
> 
> (Sorry, this must have been here in earlier versions but I only now
> noticed it.)

Yes, you are right, it is already protected by the global refcount, I'll
remove.


> > +	}
> > +	if (bedata->ref >= 0)
> > +		gnttab_end_foreign_access(bedata->ref, 0, 0);
> > +	kfree(bedata->ring.sring);
> > +	kfree(bedata);
> > +	xenbus_switch_state(dev, XenbusStateClosed);
> >  	return 0;
> >  }
> >  
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-17 18:34     ` Boris Ostrovsky
@ 2017-10-23 23:03       ` Stefano Stabellini
  2017-10-24 13:52         ` Boris Ostrovsky
  2017-10-24 13:52         ` Boris Ostrovsky
  2017-10-23 23:03       ` Stefano Stabellini
  1 sibling, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-23 23:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Introduce a waitqueue to allow only one outstanding accept command at
> > any given time and to implement polling on the passive socket. Introduce
> > a flags field to keep track of in-flight accept and poll commands.
> > 
> > Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
> > sure that only one accept command is executed at any given time by
> > setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
> > inflight_accept_req waitqueue.
> > 
> > Convert the new struct sock_mapping pointer into an uint64_t and use it
> > as id for the new socket to pass to the backend.
> > 
> > Check if the accept call is non-blocking: in that case after sending the
> > ACCEPT command to the backend store the sock_mapping pointer of the new
> > struct and the inflight req_id then return -EAGAIN (which will respond
> > only when there is something to accept). Next time accept is called,
> > we'll check if the ACCEPT command has been answered, if so we'll pick up
> > where we left off, otherwise we return -EAGAIN again.
> > 
> > Note that, differently from the other commands, we can use
> > wait_event_interruptible (instead of wait_event) in the case of accept
> > as we are able to track the req_id of the ACCEPT response that we are
> > waiting.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/xen/pvcalls-front.h |   3 +
> >  2 files changed, 149 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index 5433fae..8958e74 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -77,6 +77,16 @@ struct sock_mapping {
> >  #define PVCALLS_STATUS_BIND          1
> >  #define PVCALLS_STATUS_LISTEN        2
> >  			uint8_t status;
> > +		/*
> > +		 * Internal state-machine flags.
> > +		 * Only one accept operation can be inflight for a socket.
> > +		 * Only one poll operation can be inflight for a given socket.
> > +		 */
> > +#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
> > +			uint8_t flags;
> > +			uint32_t inflight_req_id;
> > +			struct sock_mapping *accept_map;
> > +			wait_queue_head_t inflight_accept_req;
> >  		} passive;
> >  	};
> >  };
> > @@ -392,6 +402,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> >  	memcpy(req->u.bind.addr, addr, sizeof(*addr));
> >  	req->u.bind.len = addr_len;
> >  
> > +	init_waitqueue_head(&map->passive.inflight_accept_req);
> > +
> >  	map->active_socket = false;
> >  
> >  	bedata->ring.req_prod_pvt++;
> > @@ -470,6 +482,140 @@ int pvcalls_front_listen(struct socket *sock, int backlog)
> >  	return ret;
> >  }
> >  
> > +int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
> > +{
> > +	struct pvcalls_bedata *bedata;
> > +	struct sock_mapping *map;
> > +	struct sock_mapping *map2 = NULL;
> > +	struct xen_pvcalls_request *req;
> > +	int notify, req_id, ret, evtchn, nonblock;
> > +
> > +	pvcalls_enter();
> > +	if (!pvcalls_front_dev) {
> > +		pvcalls_exit();
> > +		return -ENOTCONN;
> > +	}
> > +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> > +
> > +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> > +	if (!map) {
> > +		pvcalls_exit();
> > +		return -ENOTSOCK;
> > +	}
> > +
> > +	if (map->passive.status != PVCALLS_STATUS_LISTEN) {
> > +		pvcalls_exit();
> > +		return -EINVAL;
> > +	}
> > +
> > +	nonblock = flags & SOCK_NONBLOCK;
> > +	/*
> > +	 * Backend only supports 1 inflight accept request, will return
> > +	 * errors for the others
> > +	 */
> > +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			     (void *)&map->passive.flags)) {
> > +		req_id = READ_ONCE(map->passive.inflight_req_id);
> > +		if (req_id != PVCALLS_INVALID_ID &&
> > +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
> 
> 
> READ_ONCE (especially the second one)? I know I may sound fixated on
> this but I really don't understand how compiler may do anything wrong if
> straight reads were used.
> 
> For the first case, I guess, theoretically the compiler may decide to
> re-fetch map->passive.inflight_req_id. But even if it did, would that be
> a problem? Both of these READ_ONCE targets are updated below before
> PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
> change between re-fetching, I think. (The only exception is the noblock
> case, which does WRITE_ONCE that don't understand either)

READ_ONCE is reasonably cheap: do we really want to have this kind of
conversation every time we touch this code in the future? Personally, I
would have used READ/WRITE_ONCE everywhere for inflight_req_id and
req_id, because it makes the code easier to understand. 

We have already limited their usage, but at least we have followed a set
of guidelines. Doing further optimizations on this code seems
unnecessary and prone to confuse the reader.


> > +			map2 = map->passive.accept_map;
> > +			goto received;
> > +		}
> > +		if (nonblock) {
> > +			pvcalls_exit();
> > +			return -EAGAIN;
> > +		}
> > +		if (wait_event_interruptible(map->passive.inflight_accept_req,
> > +			!test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +					  (void *)&map->passive.flags))) {
> > +			pvcalls_exit();
> > +			return -EINTR;
> > +		}
> > +	}
> > +
> > +	spin_lock(&bedata->socket_lock);
> > +	ret = get_request(bedata, &req_id);
> > +	if (ret < 0) {
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		spin_unlock(&bedata->socket_lock);
> > +		pvcalls_exit();
> > +		return ret;
> > +	}
> > +	map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
> > +	if (map2 == NULL) {
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		spin_unlock(&bedata->socket_lock);
> > +		pvcalls_exit();
> > +		return -ENOMEM;
> > +	}
> > +	ret =  create_active(map2, &evtchn);
> > +	if (ret < 0) {
> > +		kfree(map2);
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		spin_unlock(&bedata->socket_lock);
> > +		pvcalls_exit();
> > +		return -ENOMEM;
> 
> Why not ret?

yes, good idea.


> 
> > +	}
> > +	list_add_tail(&map2->list, &bedata->socket_mappings);
> > +
> > +	req = RING_GET_REQUEST(&bedata->ring, req_id);
> > +	req->req_id = req_id;
> > +	req->cmd = PVCALLS_ACCEPT;
> > +	req->u.accept.id = (uint64_t) map;
> > +	req->u.accept.ref = map2->active.ref;
> > +	req->u.accept.id_new = (uint64_t) map2;
> > +	req->u.accept.evtchn = evtchn;
> > +	map->passive.accept_map = map2;
> > +
> > +	bedata->ring.req_prod_pvt++;
> > +	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
> > +	spin_unlock(&bedata->socket_lock);
> > +	if (notify)
> > +		notify_remote_via_irq(bedata->irq);
> > +	/* We could check if we have received a response before returning. */
> > +	if (nonblock) {
> > +		WRITE_ONCE(map->passive.inflight_req_id, req_id);
> > +		pvcalls_exit();
> > +		return -EAGAIN;
> > +	}
> > +
> > +	if (wait_event_interruptible(bedata->inflight_req,
> > +		READ_ONCE(bedata->rsp[req_id].req_id) == req_id)) {
> > +		pvcalls_exit();
> > +		return -EINTR;
> > +	}
> > +	/* read req_id, then the content */
> > +	smp_rmb();
> > +
> > +received:
> > +	map2->sock = newsock;
> > +	newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL);
> > +	if (!newsock->sk) {
> > +		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> > +		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		pvcalls_front_free_map(bedata, map2);
> > +		kfree(map2);
> > +		pvcalls_exit();
> > +		return -ENOMEM;
> > +	}
> > +	newsock->sk->sk_send_head = (void *)map2;
> > +
> > +	ret = bedata->rsp[req_id].ret;
> > +	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> > +	map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> > +
> > +	clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)&map->passive.flags);
> > +	wake_up(&map->passive.inflight_accept_req);
> > +
> > +	pvcalls_exit();
> > +	return ret;
> > +}
> > +
> >  static const struct xenbus_device_id pvcalls_front_ids[] = {
> >  	{ "pvcalls" },
> >  	{ "" }
> > diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
> > index aa8fe10..ab4f1da 100644
> > --- a/drivers/xen/pvcalls-front.h
> > +++ b/drivers/xen/pvcalls-front.h
> > @@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock,
> >  		       struct sockaddr *addr,
> >  		       int addr_len);
> >  int pvcalls_front_listen(struct socket *sock, int backlog);
> > +int pvcalls_front_accept(struct socket *sock,
> > +			 struct socket *newsock,
> > +			 int flags);
> >  
> >  #endif
> > 
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-17 18:34     ` Boris Ostrovsky
  2017-10-23 23:03       ` Stefano Stabellini
@ 2017-10-23 23:03       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-23 23:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Introduce a waitqueue to allow only one outstanding accept command at
> > any given time and to implement polling on the passive socket. Introduce
> > a flags field to keep track of in-flight accept and poll commands.
> > 
> > Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
> > sure that only one accept command is executed at any given time by
> > setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
> > inflight_accept_req waitqueue.
> > 
> > Convert the new struct sock_mapping pointer into an uint64_t and use it
> > as id for the new socket to pass to the backend.
> > 
> > Check if the accept call is non-blocking: in that case after sending the
> > ACCEPT command to the backend store the sock_mapping pointer of the new
> > struct and the inflight req_id then return -EAGAIN (which will respond
> > only when there is something to accept). Next time accept is called,
> > we'll check if the ACCEPT command has been answered, if so we'll pick up
> > where we left off, otherwise we return -EAGAIN again.
> > 
> > Note that, differently from the other commands, we can use
> > wait_event_interruptible (instead of wait_event) in the case of accept
> > as we are able to track the req_id of the ACCEPT response that we are
> > waiting.
> > 
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/xen/pvcalls-front.h |   3 +
> >  2 files changed, 149 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index 5433fae..8958e74 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -77,6 +77,16 @@ struct sock_mapping {
> >  #define PVCALLS_STATUS_BIND          1
> >  #define PVCALLS_STATUS_LISTEN        2
> >  			uint8_t status;
> > +		/*
> > +		 * Internal state-machine flags.
> > +		 * Only one accept operation can be inflight for a socket.
> > +		 * Only one poll operation can be inflight for a given socket.
> > +		 */
> > +#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
> > +			uint8_t flags;
> > +			uint32_t inflight_req_id;
> > +			struct sock_mapping *accept_map;
> > +			wait_queue_head_t inflight_accept_req;
> >  		} passive;
> >  	};
> >  };
> > @@ -392,6 +402,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> >  	memcpy(req->u.bind.addr, addr, sizeof(*addr));
> >  	req->u.bind.len = addr_len;
> >  
> > +	init_waitqueue_head(&map->passive.inflight_accept_req);
> > +
> >  	map->active_socket = false;
> >  
> >  	bedata->ring.req_prod_pvt++;
> > @@ -470,6 +482,140 @@ int pvcalls_front_listen(struct socket *sock, int backlog)
> >  	return ret;
> >  }
> >  
> > +int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
> > +{
> > +	struct pvcalls_bedata *bedata;
> > +	struct sock_mapping *map;
> > +	struct sock_mapping *map2 = NULL;
> > +	struct xen_pvcalls_request *req;
> > +	int notify, req_id, ret, evtchn, nonblock;
> > +
> > +	pvcalls_enter();
> > +	if (!pvcalls_front_dev) {
> > +		pvcalls_exit();
> > +		return -ENOTCONN;
> > +	}
> > +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
> > +
> > +	map = (struct sock_mapping *) sock->sk->sk_send_head;
> > +	if (!map) {
> > +		pvcalls_exit();
> > +		return -ENOTSOCK;
> > +	}
> > +
> > +	if (map->passive.status != PVCALLS_STATUS_LISTEN) {
> > +		pvcalls_exit();
> > +		return -EINVAL;
> > +	}
> > +
> > +	nonblock = flags & SOCK_NONBLOCK;
> > +	/*
> > +	 * Backend only supports 1 inflight accept request, will return
> > +	 * errors for the others
> > +	 */
> > +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			     (void *)&map->passive.flags)) {
> > +		req_id = READ_ONCE(map->passive.inflight_req_id);
> > +		if (req_id != PVCALLS_INVALID_ID &&
> > +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
> 
> 
> READ_ONCE (especially the second one)? I know I may sound fixated on
> this but I really don't understand how compiler may do anything wrong if
> straight reads were used.
> 
> For the first case, I guess, theoretically the compiler may decide to
> re-fetch map->passive.inflight_req_id. But even if it did, would that be
> a problem? Both of these READ_ONCE targets are updated below before
> PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
> change between re-fetching, I think. (The only exception is the noblock
> case, which does WRITE_ONCE that don't understand either)

READ_ONCE is reasonably cheap: do we really want to have this kind of
conversation every time we touch this code in the future? Personally, I
would have used READ/WRITE_ONCE everywhere for inflight_req_id and
req_id, because it makes the code easier to understand. 

We have already limited their usage, but at least we have followed a set
of guidelines. Doing further optimizations on this code seems
unnecessary and prone to confuse the reader.


> > +			map2 = map->passive.accept_map;
> > +			goto received;
> > +		}
> > +		if (nonblock) {
> > +			pvcalls_exit();
> > +			return -EAGAIN;
> > +		}
> > +		if (wait_event_interruptible(map->passive.inflight_accept_req,
> > +			!test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +					  (void *)&map->passive.flags))) {
> > +			pvcalls_exit();
> > +			return -EINTR;
> > +		}
> > +	}
> > +
> > +	spin_lock(&bedata->socket_lock);
> > +	ret = get_request(bedata, &req_id);
> > +	if (ret < 0) {
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		spin_unlock(&bedata->socket_lock);
> > +		pvcalls_exit();
> > +		return ret;
> > +	}
> > +	map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
> > +	if (map2 == NULL) {
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		spin_unlock(&bedata->socket_lock);
> > +		pvcalls_exit();
> > +		return -ENOMEM;
> > +	}
> > +	ret =  create_active(map2, &evtchn);
> > +	if (ret < 0) {
> > +		kfree(map2);
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		spin_unlock(&bedata->socket_lock);
> > +		pvcalls_exit();
> > +		return -ENOMEM;
> 
> Why not ret?

yes, good idea.


> 
> > +	}
> > +	list_add_tail(&map2->list, &bedata->socket_mappings);
> > +
> > +	req = RING_GET_REQUEST(&bedata->ring, req_id);
> > +	req->req_id = req_id;
> > +	req->cmd = PVCALLS_ACCEPT;
> > +	req->u.accept.id = (uint64_t) map;
> > +	req->u.accept.ref = map2->active.ref;
> > +	req->u.accept.id_new = (uint64_t) map2;
> > +	req->u.accept.evtchn = evtchn;
> > +	map->passive.accept_map = map2;
> > +
> > +	bedata->ring.req_prod_pvt++;
> > +	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
> > +	spin_unlock(&bedata->socket_lock);
> > +	if (notify)
> > +		notify_remote_via_irq(bedata->irq);
> > +	/* We could check if we have received a response before returning. */
> > +	if (nonblock) {
> > +		WRITE_ONCE(map->passive.inflight_req_id, req_id);
> > +		pvcalls_exit();
> > +		return -EAGAIN;
> > +	}
> > +
> > +	if (wait_event_interruptible(bedata->inflight_req,
> > +		READ_ONCE(bedata->rsp[req_id].req_id) == req_id)) {
> > +		pvcalls_exit();
> > +		return -EINTR;
> > +	}
> > +	/* read req_id, then the content */
> > +	smp_rmb();
> > +
> > +received:
> > +	map2->sock = newsock;
> > +	newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL);
> > +	if (!newsock->sk) {
> > +		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> > +		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> > +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +			  (void *)&map->passive.flags);
> > +		pvcalls_front_free_map(bedata, map2);
> > +		kfree(map2);
> > +		pvcalls_exit();
> > +		return -ENOMEM;
> > +	}
> > +	newsock->sk->sk_send_head = (void *)map2;
> > +
> > +	ret = bedata->rsp[req_id].ret;
> > +	bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
> > +	map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> > +
> > +	clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)&map->passive.flags);
> > +	wake_up(&map->passive.inflight_accept_req);
> > +
> > +	pvcalls_exit();
> > +	return ret;
> > +}
> > +
> >  static const struct xenbus_device_id pvcalls_front_ids[] = {
> >  	{ "pvcalls" },
> >  	{ "" }
> > diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
> > index aa8fe10..ab4f1da 100644
> > --- a/drivers/xen/pvcalls-front.h
> > +++ b/drivers/xen/pvcalls-front.h
> > @@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock,
> >  		       struct sockaddr *addr,
> >  		       int addr_len);
> >  int pvcalls_front_listen(struct socket *sock, int backlog);
> > +int pvcalls_front_accept(struct socket *sock,
> > +			 struct socket *newsock,
> > +			 int flags);
> >  
> >  #endif
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/13] xen/pvcalls: implement poll command
  2017-10-17 22:15       ` Boris Ostrovsky
  (?)
  (?)
@ 2017-10-23 23:06       ` Stefano Stabellini
  2017-10-24 14:02         ` Boris Ostrovsky
  2017-10-24 14:02         ` Boris Ostrovsky
  -1 siblings, 2 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-23 23:06 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> > +static unsigned int pvcalls_front_poll_passive(struct file *file,
> > +					       struct pvcalls_bedata *bedata,
> > +					       struct sock_mapping *map,
> > +					       poll_table *wait)
> > +{
> > +	int notify, req_id, ret;
> > +	struct xen_pvcalls_request *req;
> > +
> > +	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +		     (void *)&map->passive.flags)) {
> > +		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
> > +
> > +		if (req_id != PVCALLS_INVALID_ID &&
> > +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
> > +			return POLLIN | POLLRDNORM;
> 
> 
> Same READ_ONCE() question as for an earlier patch.

Same answer :-)


> > +
> > +		poll_wait(file, &map->passive.inflight_accept_req, wait);
> > +		return 0;
> > +	}
> > +
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/13] xen/pvcalls: implement poll command
  2017-10-17 22:15       ` Boris Ostrovsky
  (?)
@ 2017-10-23 23:06       ` Stefano Stabellini
  -1 siblings, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-23 23:06 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> > +static unsigned int pvcalls_front_poll_passive(struct file *file,
> > +					       struct pvcalls_bedata *bedata,
> > +					       struct sock_mapping *map,
> > +					       poll_table *wait)
> > +{
> > +	int notify, req_id, ret;
> > +	struct xen_pvcalls_request *req;
> > +
> > +	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +		     (void *)&map->passive.flags)) {
> > +		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
> > +
> > +		if (req_id != PVCALLS_INVALID_ID &&
> > +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
> > +			return POLLIN | POLLRDNORM;
> 
> 
> Same READ_ONCE() question as for an earlier patch.

Same answer :-)


> > +
> > +		poll_wait(file, &map->passive.inflight_accept_req, wait);
> > +		return 0;
> > +	}
> > +
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-23 23:03       ` Stefano Stabellini
@ 2017-10-24 13:52         ` Boris Ostrovsky
  2017-10-24 16:42           ` Stefano Stabellini
  2017-10-24 16:42           ` Stefano Stabellini
  2017-10-24 13:52         ` Boris Ostrovsky
  1 sibling, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-24 13:52 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini

On 10/23/2017 07:03 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
>>> +	/*
>>> +	 * Backend only supports 1 inflight accept request, will return
>>> +	 * errors for the others
>>> +	 */
>>> +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
>>> +			     (void *)&map->passive.flags)) {
>>> +		req_id = READ_ONCE(map->passive.inflight_req_id);
>>> +		if (req_id != PVCALLS_INVALID_ID &&
>>> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
>>
>> READ_ONCE (especially the second one)? I know I may sound fixated on
>> this but I really don't understand how compiler may do anything wrong if
>> straight reads were used.
>>
>> For the first case, I guess, theoretically the compiler may decide to
>> re-fetch map->passive.inflight_req_id. But even if it did, would that be
>> a problem? Both of these READ_ONCE targets are updated below before
>> PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
>> change between re-fetching, I think. (The only exception is the noblock
>> case, which does WRITE_ONCE that don't understand either)
> READ_ONCE is reasonably cheap: do we really want to have this kind of
> conversation every time we touch this code in the future? Personally, I
> would have used READ/WRITE_ONCE everywhere for inflight_req_id and
> req_id, because it makes the code easier to understand.

I guess it's a matter of opinion. I actually think it's harder to read.

But it doesn't make the code wrong so...

>
> We have already limited their usage, but at least we have followed a set
> of guidelines. Doing further optimizations on this code seems
> unnecessary and prone to confuse the reader.
>
>

>>> +	ret =  create_active(map2, &evtchn);
>>> +	if (ret < 0) {
>>> +		kfree(map2);
>>> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
>>> +			  (void *)&map->passive.flags);
>>> +		spin_unlock(&bedata->socket_lock);
>>> +		pvcalls_exit();
>>> +		return -ENOMEM;
>> Why not ret?
> yes, good idea.

With that fixed (and extra space removed in 'ret =  create_active(map2,
&evtchn);')

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-23 23:03       ` Stefano Stabellini
  2017-10-24 13:52         ` Boris Ostrovsky
@ 2017-10-24 13:52         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-24 13:52 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel

On 10/23/2017 07:03 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
>>> +	/*
>>> +	 * Backend only supports 1 inflight accept request, will return
>>> +	 * errors for the others
>>> +	 */
>>> +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
>>> +			     (void *)&map->passive.flags)) {
>>> +		req_id = READ_ONCE(map->passive.inflight_req_id);
>>> +		if (req_id != PVCALLS_INVALID_ID &&
>>> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
>>
>> READ_ONCE (especially the second one)? I know I may sound fixated on
>> this but I really don't understand how compiler may do anything wrong if
>> straight reads were used.
>>
>> For the first case, I guess, theoretically the compiler may decide to
>> re-fetch map->passive.inflight_req_id. But even if it did, would that be
>> a problem? Both of these READ_ONCE targets are updated below before
>> PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
>> change between re-fetching, I think. (The only exception is the noblock
>> case, which does WRITE_ONCE that don't understand either)
> READ_ONCE is reasonably cheap: do we really want to have this kind of
> conversation every time we touch this code in the future? Personally, I
> would have used READ/WRITE_ONCE everywhere for inflight_req_id and
> req_id, because it makes the code easier to understand.

I guess it's a matter of opinion. I actually think it's harder to read.

But it doesn't make the code wrong so...

>
> We have already limited their usage, but at least we have followed a set
> of guidelines. Doing further optimizations on this code seems
> unnecessary and prone to confuse the reader.
>
>

>>> +	ret =  create_active(map2, &evtchn);
>>> +	if (ret < 0) {
>>> +		kfree(map2);
>>> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
>>> +			  (void *)&map->passive.flags);
>>> +		spin_unlock(&bedata->socket_lock);
>>> +		pvcalls_exit();
>>> +		return -ENOMEM;
>> Why not ret?
> yes, good idea.

With that fixed (and extra space removed in 'ret =  create_active(map2,
&evtchn);')

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/13] xen/pvcalls: implement poll command
  2017-10-23 23:06       ` Stefano Stabellini
  2017-10-24 14:02         ` Boris Ostrovsky
@ 2017-10-24 14:02         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-24 14:02 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, linux-kernel, jgross, Stefano Stabellini

On 10/23/2017 07:06 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>>> +static unsigned int pvcalls_front_poll_passive(struct file *file,
>>> +					       struct pvcalls_bedata *bedata,
>>> +					       struct sock_mapping *map,
>>> +					       poll_table *wait)
>>> +{
>>> +	int notify, req_id, ret;
>>> +	struct xen_pvcalls_request *req;
>>> +
>>> +	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
>>> +		     (void *)&map->passive.flags)) {
>>> +		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
>>> +
>>> +		if (req_id != PVCALLS_INVALID_ID &&
>>> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
>>> +			return POLLIN | POLLRDNORM;
>>
>> Same READ_ONCE() question as for an earlier patch.
> Same answer :-)


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

>
>
>>> +
>>> +		poll_wait(file, &map->passive.inflight_accept_req, wait);
>>> +		return 0;
>>> +	}
>>> +

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/13] xen/pvcalls: implement poll command
  2017-10-23 23:06       ` Stefano Stabellini
@ 2017-10-24 14:02         ` Boris Ostrovsky
  2017-10-24 14:02         ` Boris Ostrovsky
  1 sibling, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-24 14:02 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jgross, Stefano Stabellini, linux-kernel, xen-devel

On 10/23/2017 07:06 PM, Stefano Stabellini wrote:
> On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
>>> +static unsigned int pvcalls_front_poll_passive(struct file *file,
>>> +					       struct pvcalls_bedata *bedata,
>>> +					       struct sock_mapping *map,
>>> +					       poll_table *wait)
>>> +{
>>> +	int notify, req_id, ret;
>>> +	struct xen_pvcalls_request *req;
>>> +
>>> +	if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
>>> +		     (void *)&map->passive.flags)) {
>>> +		uint32_t req_id = READ_ONCE(map->passive.inflight_req_id);
>>> +
>>> +		if (req_id != PVCALLS_INVALID_ID &&
>>> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id)
>>> +			return POLLIN | POLLRDNORM;
>>
>> Same READ_ONCE() question as for an earlier patch.
> Same answer :-)


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

>
>
>>> +
>>> +		poll_wait(file, &map->passive.inflight_accept_req, wait);
>>> +		return 0;
>>> +	}
>>> +


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 12/13] xen/pvcalls: implement release command
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
  (?)
@ 2017-10-24 14:17     ` Boris Ostrovsky
  2017-10-24 17:17       ` Stefano Stabellini
  2017-10-24 17:17       ` Stefano Stabellini
  -1 siblings, 2 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-24 14:17 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: linux-kernel, jgross, Stefano Stabellini

(I just noticed that I missed this patch, sorry)

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
> in_mutex and out_mutex to avoid concurrent accesses. Then, free the
> socket.
>
> For passive sockets, check whether we have already pre-allocated an
> active socket for the purpose of being accepted. If so, free that as
> well.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/pvcalls-front.h |  1 +
>  2 files changed, 99 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index aca2b32..9beb34d 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -200,6 +200,19 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
>  static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
>  				   struct sock_mapping *map)
>  {
> +	int i;
> +
> +	unbind_from_irqhandler(map->active.irq, map);
> +
> +	spin_lock(&bedata->socket_lock);
> +	if (!list_empty(&map->list))
> +		list_del_init(&map->list);

As with patch 2, do you need to init this? In fact, do you need to do
anything with the list? We are about to free the map (and so maybe bring
'kfree(map)" inside here, btw?)

And what does it mean if the list is not empty? Is it OK to free the map?

> +	spin_unlock(&bedata->socket_lock);
> +
> +	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
> +		gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
> +	gnttab_end_foreign_access(map->active.ref, 0, 0);
> +	free_page((unsigned long)map->active.ring);
>  }
>  
>  static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
> @@ -968,6 +981,91 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
>  	return ret;
>  }
>  


> +
> +	if (map->active_socket) {
> +		/*
> +		 * Set in_error and wake up inflight_conn_req to force
> +		 * recvmsg waiters to exit.
> +		 */
> +		map->active.ring->in_error = -EBADF;
> +		wake_up_interruptible(&map->active.inflight_conn_req);
> +
> +		/*
> +		 * Wait until there are no more waiters on the mutexes.
> +		 * We know that no new waiters can be added because sk_send_head
> +		 * is set to NULL -- we only need to wait for the existing
> +		 * waiters to return.
> +		 */
> +		while (!mutex_trylock(&map->active.in_mutex) ||
> +			   !mutex_trylock(&map->active.out_mutex))
> +			cpu_relax();
> +
> +		pvcalls_front_free_map(bedata, map);
> +		kfree(map);
> +	} else {
> +		spin_lock(&bedata->socket_lock);
> +		if (READ_ONCE(map->passive.inflight_req_id) !=
> +		    PVCALLS_INVALID_ID) {
> +			pvcalls_front_free_map(bedata,

pvcalls_front_free_map will try to grab bedata->socket_lock, which we are already holding.


> +					       map->passive.accept_map);
> +			kfree(map->passive.accept_map);
> +		}
> +		list_del_init(&map->list);

Again, no init?

-boris

> +		kfree(map);
> +		spin_unlock(&bedata->socket_lock);
> +	}
> +	WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 12/13] xen/pvcalls: implement release command
  2017-10-07  0:30     ` Stefano Stabellini
  (?)
@ 2017-10-24 14:17     ` Boris Ostrovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Boris Ostrovsky @ 2017-10-24 14:17 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: jgross, Stefano Stabellini, linux-kernel

(I just noticed that I missed this patch, sorry)

On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
> in_mutex and out_mutex to avoid concurrent accesses. Then, free the
> socket.
>
> For passive sockets, check whether we have already pre-allocated an
> active socket for the purpose of being accepted. If so, free that as
> well.
>
> Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> CC: boris.ostrovsky@oracle.com
> CC: jgross@suse.com
> ---
>  drivers/xen/pvcalls-front.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/pvcalls-front.h |  1 +
>  2 files changed, 99 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index aca2b32..9beb34d 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -200,6 +200,19 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
>  static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
>  				   struct sock_mapping *map)
>  {
> +	int i;
> +
> +	unbind_from_irqhandler(map->active.irq, map);
> +
> +	spin_lock(&bedata->socket_lock);
> +	if (!list_empty(&map->list))
> +		list_del_init(&map->list);

As with patch 2, do you need to init this? In fact, do you need to do
anything with the list? We are about to free the map (and so maybe bring
'kfree(map)" inside here, btw?)

And what does it mean if the list is not empty? Is it OK to free the map?

> +	spin_unlock(&bedata->socket_lock);
> +
> +	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
> +		gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
> +	gnttab_end_foreign_access(map->active.ref, 0, 0);
> +	free_page((unsigned long)map->active.ring);
>  }
>  
>  static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
> @@ -968,6 +981,91 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
>  	return ret;
>  }
>  


> +
> +	if (map->active_socket) {
> +		/*
> +		 * Set in_error and wake up inflight_conn_req to force
> +		 * recvmsg waiters to exit.
> +		 */
> +		map->active.ring->in_error = -EBADF;
> +		wake_up_interruptible(&map->active.inflight_conn_req);
> +
> +		/*
> +		 * Wait until there are no more waiters on the mutexes.
> +		 * We know that no new waiters can be added because sk_send_head
> +		 * is set to NULL -- we only need to wait for the existing
> +		 * waiters to return.
> +		 */
> +		while (!mutex_trylock(&map->active.in_mutex) ||
> +			   !mutex_trylock(&map->active.out_mutex))
> +			cpu_relax();
> +
> +		pvcalls_front_free_map(bedata, map);
> +		kfree(map);
> +	} else {
> +		spin_lock(&bedata->socket_lock);
> +		if (READ_ONCE(map->passive.inflight_req_id) !=
> +		    PVCALLS_INVALID_ID) {
> +			pvcalls_front_free_map(bedata,

pvcalls_front_free_map will try to grab bedata->socket_lock, which we are already holding.


> +					       map->passive.accept_map);
> +			kfree(map->passive.accept_map);
> +		}
> +		list_del_init(&map->list);

Again, no init?

-boris

> +		kfree(map);
> +		spin_unlock(&bedata->socket_lock);
> +	}
> +	WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-24 13:52         ` Boris Ostrovsky
  2017-10-24 16:42           ` Stefano Stabellini
@ 2017-10-24 16:42           ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-24 16:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 24 Oct 2017, Boris Ostrovsky wrote:
> On 10/23/2017 07:03 PM, Stefano Stabellini wrote:
> > On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> >> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> >>> +	/*
> >>> +	 * Backend only supports 1 inflight accept request, will return
> >>> +	 * errors for the others
> >>> +	 */
> >>> +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> >>> +			     (void *)&map->passive.flags)) {
> >>> +		req_id = READ_ONCE(map->passive.inflight_req_id);
> >>> +		if (req_id != PVCALLS_INVALID_ID &&
> >>> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
> >>
> >> READ_ONCE (especially the second one)? I know I may sound fixated on
> >> this but I really don't understand how compiler may do anything wrong if
> >> straight reads were used.
> >>
> >> For the first case, I guess, theoretically the compiler may decide to
> >> re-fetch map->passive.inflight_req_id. But even if it did, would that be
> >> a problem? Both of these READ_ONCE targets are updated below before
> >> PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
> >> change between re-fetching, I think. (The only exception is the noblock
> >> case, which does WRITE_ONCE that don't understand either)
> > READ_ONCE is reasonably cheap: do we really want to have this kind of
> > conversation every time we touch this code in the future? Personally, I
> > would have used READ/WRITE_ONCE everywhere for inflight_req_id and
> > req_id, because it makes the code easier to understand.
> 
> I guess it's a matter of opinion. I actually think it's harder to read.
> 
> But it doesn't make the code wrong so...
> 
> >
> > We have already limited their usage, but at least we have followed a set
> > of guidelines. Doing further optimizations on this code seems
> > unnecessary and prone to confuse the reader.
> >
> >
> 
> >>> +	ret =  create_active(map2, &evtchn);
> >>> +	if (ret < 0) {
> >>> +		kfree(map2);
> >>> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> >>> +			  (void *)&map->passive.flags);
> >>> +		spin_unlock(&bedata->socket_lock);
> >>> +		pvcalls_exit();
> >>> +		return -ENOMEM;
> >> Why not ret?
> > yes, good idea.
> 
> With that fixed (and extra space removed in 'ret =  create_active(map2,
> &evtchn);')
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Thank you!

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/13] xen/pvcalls: implement accept command
  2017-10-24 13:52         ` Boris Ostrovsky
@ 2017-10-24 16:42           ` Stefano Stabellini
  2017-10-24 16:42           ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-24 16:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 24 Oct 2017, Boris Ostrovsky wrote:
> On 10/23/2017 07:03 PM, Stefano Stabellini wrote:
> > On Tue, 17 Oct 2017, Boris Ostrovsky wrote:
> >> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> >>> +	/*
> >>> +	 * Backend only supports 1 inflight accept request, will return
> >>> +	 * errors for the others
> >>> +	 */
> >>> +	if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> >>> +			     (void *)&map->passive.flags)) {
> >>> +		req_id = READ_ONCE(map->passive.inflight_req_id);
> >>> +		if (req_id != PVCALLS_INVALID_ID &&
> >>> +		    READ_ONCE(bedata->rsp[req_id].req_id) == req_id) {
> >>
> >> READ_ONCE (especially the second one)? I know I may sound fixated on
> >> this but I really don't understand how compiler may do anything wrong if
> >> straight reads were used.
> >>
> >> For the first case, I guess, theoretically the compiler may decide to
> >> re-fetch map->passive.inflight_req_id. But even if it did, would that be
> >> a problem? Both of these READ_ONCE targets are updated below before
> >> PVCALLS_FLAG_ACCEPT_INFLIGHT is cleared so there should not be any
> >> change between re-fetching, I think. (The only exception is the noblock
> >> case, which does WRITE_ONCE that don't understand either)
> > READ_ONCE is reasonably cheap: do we really want to have this kind of
> > conversation every time we touch this code in the future? Personally, I
> > would have used READ/WRITE_ONCE everywhere for inflight_req_id and
> > req_id, because it makes the code easier to understand.
> 
> I guess it's a matter of opinion. I actually think it's harder to read.
> 
> But it doesn't make the code wrong so...
> 
> >
> > We have already limited their usage, but at least we have followed a set
> > of guidelines. Doing further optimizations on this code seems
> > unnecessary and prone to confuse the reader.
> >
> >
> 
> >>> +	ret =  create_active(map2, &evtchn);
> >>> +	if (ret < 0) {
> >>> +		kfree(map2);
> >>> +		clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> >>> +			  (void *)&map->passive.flags);
> >>> +		spin_unlock(&bedata->socket_lock);
> >>> +		pvcalls_exit();
> >>> +		return -ENOMEM;
> >> Why not ret?
> > yes, good idea.
> 
> With that fixed (and extra space removed in 'ret =  create_active(map2,
> &evtchn);')
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Thank you!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 12/13] xen/pvcalls: implement release command
  2017-10-24 14:17     ` Boris Ostrovsky
@ 2017-10-24 17:17       ` Stefano Stabellini
  2017-10-24 17:17       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-24 17:17 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, jgross, Stefano Stabellini

On Tue, 24 Oct 2017, Boris Ostrovsky wrote:
> (I just noticed that I missed this patch, sorry)

Thanks for the review!


> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
> > in_mutex and out_mutex to avoid concurrent accesses. Then, free the
> > socket.
> >
> > For passive sockets, check whether we have already pre-allocated an
> > active socket for the purpose of being accepted. If so, free that as
> > well.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/xen/pvcalls-front.h |  1 +
> >  2 files changed, 99 insertions(+)
> >
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index aca2b32..9beb34d 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -200,6 +200,19 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> >  static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
> >  				   struct sock_mapping *map)
> >  {
> > +	int i;
> > +
> > +	unbind_from_irqhandler(map->active.irq, map);
> > +
> > +	spin_lock(&bedata->socket_lock);
> > +	if (!list_empty(&map->list))
> > +		list_del_init(&map->list);
> 
> As with patch 2, do you need to init this? In fact, do you need to do
> anything with the list? We are about to free the map (and so maybe bring
> 'kfree(map)" inside here, btw?)
> 
> And what does it mean if the list is not empty? Is it OK to free the map?

Yes, list_del_init should be just list_del in this case.

These two lines are only there to remove the map from socket_mappings if
the map is part of one. Normally, map->list should NOT be empty.

Yes, kfree(map) could be in pvcalls_front_free_map, I'll make the
change.


I have just noticed that we have a socketpass_mappings in struct
pvcalls_bedata that used to be used in earlier versions of this series,
but it is now unused. Today, we just use socket_mappings for both active
and passive sockets. I'll remove it and fix pvcalls_front_remove
accordingly.


> > +	spin_unlock(&bedata->socket_lock);
> > +
> > +	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
> > +		gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
> > +	gnttab_end_foreign_access(map->active.ref, 0, 0);
> > +	free_page((unsigned long)map->active.ring);
> >  }
> >  
> >  static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
> > @@ -968,6 +981,91 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
> >  	return ret;
> >  }
> >  
> 
> 
> > +
> > +	if (map->active_socket) {
> > +		/*
> > +		 * Set in_error and wake up inflight_conn_req to force
> > +		 * recvmsg waiters to exit.
> > +		 */
> > +		map->active.ring->in_error = -EBADF;
> > +		wake_up_interruptible(&map->active.inflight_conn_req);
> > +
> > +		/*
> > +		 * Wait until there are no more waiters on the mutexes.
> > +		 * We know that no new waiters can be added because sk_send_head
> > +		 * is set to NULL -- we only need to wait for the existing
> > +		 * waiters to return.
> > +		 */
> > +		while (!mutex_trylock(&map->active.in_mutex) ||
> > +			   !mutex_trylock(&map->active.out_mutex))
> > +			cpu_relax();
> > +
> > +		pvcalls_front_free_map(bedata, map);
> > +		kfree(map);
> > +	} else {
> > +		spin_lock(&bedata->socket_lock);
> > +		if (READ_ONCE(map->passive.inflight_req_id) !=
> > +		    PVCALLS_INVALID_ID) {
> > +			pvcalls_front_free_map(bedata,
> 
> pvcalls_front_free_map will try to grab bedata->socket_lock, which we are already holding.

This is a mistake, well spotted! I'll add a boolean "locked" parameter
to pvcalls_front_free_map. If (locked), pvcalls_front_free_map won't
spin_lock.


> 
> > +					       map->passive.accept_map);
> > +			kfree(map->passive.accept_map);
> > +		}
> > +		list_del_init(&map->list);
> 
> Again, no init?

Yes, I'll remove


> > +		kfree(map);
> > +		spin_unlock(&bedata->socket_lock);
> > +	}
> > +	WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
> >
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 12/13] xen/pvcalls: implement release command
  2017-10-24 14:17     ` Boris Ostrovsky
  2017-10-24 17:17       ` Stefano Stabellini
@ 2017-10-24 17:17       ` Stefano Stabellini
  1 sibling, 0 replies; 73+ messages in thread
From: Stefano Stabellini @ 2017-10-24 17:17 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: jgross, Stefano Stabellini, Stefano Stabellini, linux-kernel, xen-devel

On Tue, 24 Oct 2017, Boris Ostrovsky wrote:
> (I just noticed that I missed this patch, sorry)

Thanks for the review!


> On 10/06/2017 08:30 PM, Stefano Stabellini wrote:
> > Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
> > in_mutex and out_mutex to avoid concurrent accesses. Then, free the
> > socket.
> >
> > For passive sockets, check whether we have already pre-allocated an
> > active socket for the purpose of being accepted. If so, free that as
> > well.
> >
> > Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
> > CC: boris.ostrovsky@oracle.com
> > CC: jgross@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/xen/pvcalls-front.h |  1 +
> >  2 files changed, 99 insertions(+)
> >
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index aca2b32..9beb34d 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -200,6 +200,19 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> >  static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
> >  				   struct sock_mapping *map)
> >  {
> > +	int i;
> > +
> > +	unbind_from_irqhandler(map->active.irq, map);
> > +
> > +	spin_lock(&bedata->socket_lock);
> > +	if (!list_empty(&map->list))
> > +		list_del_init(&map->list);
> 
> As with patch 2, do you need to init this? In fact, do you need to do
> anything with the list? We are about to free the map (and so maybe bring
> 'kfree(map)" inside here, btw?)
> 
> And what does it mean if the list is not empty? Is it OK to free the map?

Yes, list_del_init should be just list_del in this case.

These two lines are only there to remove the map from socket_mappings if
the map is part of one. Normally, map->list should NOT be empty.

Yes, kfree(map) could be in pvcalls_front_free_map, I'll make the
change.


I have just noticed that we have a socketpass_mappings in struct
pvcalls_bedata that used to be used in earlier versions of this series,
but it is now unused. Today, we just use socket_mappings for both active
and passive sockets. I'll remove it and fix pvcalls_front_remove
accordingly.


> > +	spin_unlock(&bedata->socket_lock);
> > +
> > +	for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++)
> > +		gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
> > +	gnttab_end_foreign_access(map->active.ref, 0, 0);
> > +	free_page((unsigned long)map->active.ring);
> >  }
> >  
> >  static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
> > @@ -968,6 +981,91 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock,
> >  	return ret;
> >  }
> >  
> 
> 
> > +
> > +	if (map->active_socket) {
> > +		/*
> > +		 * Set in_error and wake up inflight_conn_req to force
> > +		 * recvmsg waiters to exit.
> > +		 */
> > +		map->active.ring->in_error = -EBADF;
> > +		wake_up_interruptible(&map->active.inflight_conn_req);
> > +
> > +		/*
> > +		 * Wait until there are no more waiters on the mutexes.
> > +		 * We know that no new waiters can be added because sk_send_head
> > +		 * is set to NULL -- we only need to wait for the existing
> > +		 * waiters to return.
> > +		 */
> > +		while (!mutex_trylock(&map->active.in_mutex) ||
> > +			   !mutex_trylock(&map->active.out_mutex))
> > +			cpu_relax();
> > +
> > +		pvcalls_front_free_map(bedata, map);
> > +		kfree(map);
> > +	} else {
> > +		spin_lock(&bedata->socket_lock);
> > +		if (READ_ONCE(map->passive.inflight_req_id) !=
> > +		    PVCALLS_INVALID_ID) {
> > +			pvcalls_front_free_map(bedata,
> 
> pvcalls_front_free_map will try to grab bedata->socket_lock, which we are already holding.

This is a mistake, well spotted! I'll add a boolean "locked" parameter
to pvcalls_front_free_map. If (locked), pvcalls_front_free_map won't
spin_lock.


> 
> > +					       map->passive.accept_map);
> > +			kfree(map->passive.accept_map);
> > +		}
> > +		list_del_init(&map->list);
> 
> Again, no init?

Yes, I'll remove


> > +		kfree(map);
> > +		spin_unlock(&bedata->socket_lock);
> > +	}
> > +	WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
> >
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2017-10-24 17:17 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-07  0:30 [PATCH v5 00/13] introduce the Xen PV Calls frontend:wq Stefano Stabellini
2017-10-07  0:30 ` [PATCH v5 01/13] xen/pvcalls: introduce the pvcalls xenbus frontend Stefano Stabellini
2017-10-07  0:30   ` Stefano Stabellini
2017-10-07  0:30   ` [PATCH v5 02/13] xen/pvcalls: implement frontend disconnect Stefano Stabellini
2017-10-07  0:30   ` Stefano Stabellini
2017-10-17 16:01     ` Boris Ostrovsky
2017-10-17 16:01     ` Boris Ostrovsky
2017-10-23 22:44       ` Stefano Stabellini
2017-10-23 22:44       ` Stefano Stabellini
2017-10-07  0:30   ` [PATCH v5 03/13] xen/pvcalls: connect to the backend Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-07  0:30   ` [PATCH v5 04/13] xen/pvcalls: implement socket command and handle events Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-17 16:59     ` Boris Ostrovsky
2017-10-17 16:59     ` Boris Ostrovsky
2017-10-20  1:26       ` Stefano Stabellini
2017-10-20  1:26       ` Stefano Stabellini
2017-10-20 14:24         ` Boris Ostrovsky
2017-10-20 14:24         ` Boris Ostrovsky
2017-10-07  0:30   ` [PATCH v5 05/13] xen/pvcalls: implement connect command Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-07  0:30   ` [PATCH v5 06/13] xen/pvcalls: implement bind command Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-17 17:39     ` Boris Ostrovsky
2017-10-20  1:31       ` Stefano Stabellini
2017-10-20  1:31       ` Stefano Stabellini
2017-10-20 14:40         ` Boris Ostrovsky
2017-10-20 14:40         ` Boris Ostrovsky
2017-10-17 17:39     ` Boris Ostrovsky
2017-10-07  0:30   ` [PATCH v5 07/13] xen/pvcalls: implement listen command Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-07  0:30   ` [PATCH v5 08/13] xen/pvcalls: implement accept command Stefano Stabellini
2017-10-17 18:34     ` Boris Ostrovsky
2017-10-17 18:34     ` Boris Ostrovsky
2017-10-23 23:03       ` Stefano Stabellini
2017-10-24 13:52         ` Boris Ostrovsky
2017-10-24 16:42           ` Stefano Stabellini
2017-10-24 16:42           ` Stefano Stabellini
2017-10-24 13:52         ` Boris Ostrovsky
2017-10-23 23:03       ` Stefano Stabellini
2017-10-07  0:30   ` Stefano Stabellini
2017-10-07  0:30   ` [PATCH v5 09/13] xen/pvcalls: implement sendmsg Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-17 21:06     ` Boris Ostrovsky
2017-10-17 21:06     ` Boris Ostrovsky
2017-10-20  1:41       ` Stefano Stabellini
2017-10-20  1:41       ` Stefano Stabellini
2017-10-20 14:44         ` Boris Ostrovsky
2017-10-20 14:44         ` Boris Ostrovsky
2017-10-07  0:30   ` [PATCH v5 10/13] xen/pvcalls: implement recvmsg Stefano Stabellini
2017-10-07  0:30   ` Stefano Stabellini
2017-10-17 21:35     ` Boris Ostrovsky
2017-10-17 21:35     ` Boris Ostrovsky
2017-10-20  1:38       ` Stefano Stabellini
2017-10-20  1:38       ` Stefano Stabellini
2017-10-20 14:43         ` Boris Ostrovsky
2017-10-20 14:43         ` Boris Ostrovsky
2017-10-07  0:30   ` [PATCH v5 11/13] xen/pvcalls: implement poll command Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-17 22:15     ` Boris Ostrovsky
2017-10-17 22:15       ` Boris Ostrovsky
2017-10-23 23:06       ` Stefano Stabellini
2017-10-23 23:06       ` Stefano Stabellini
2017-10-24 14:02         ` Boris Ostrovsky
2017-10-24 14:02         ` Boris Ostrovsky
2017-10-07  0:30   ` [PATCH v5 12/13] xen/pvcalls: implement release command Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini
2017-10-24 14:17     ` Boris Ostrovsky
2017-10-24 14:17     ` Boris Ostrovsky
2017-10-24 17:17       ` Stefano Stabellini
2017-10-24 17:17       ` Stefano Stabellini
2017-10-07  0:30   ` [PATCH v5 13/13] xen: introduce a Kconfig option to enable the pvcalls frontend Stefano Stabellini
2017-10-07  0:30     ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.