All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv1 0/2] xen/privcmd: prevent page migration for hypercall buffers
@ 2016-08-03 17:03 David Vrabel
  2016-08-03 17:03 ` [PATCHv1 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
  2016-08-03 17:03 ` [PATCHv1 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
  0 siblings, 2 replies; 5+ messages in thread
From: David Vrabel @ 2016-08-03 17:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Boris Ostrovsky, David Vrabel

Currently libxencall using mlocked buffers for hypercall buffers.
This pages are subject to compaction and page migration. A userspace
process may see a hypercall fail with -EFAULT if a page backing a
hypercall buffer is in the process of being migrated.

Page migration can be prevented by taking an additional reference to
the page.

There are three possible ways to do this:

1. Add a new device which when mmap()'d populated the VMA with
   suitable memory that has an additional reference.  This would give
   VMAs that have appropriate properties set (e.g., VM_DONTCOPY)
   without having to do additional madvise() calls.

2. Add a new ioctl to privcmd to populate a VMA with suitably
   ref-counted pages.  However, mmap() on privcmd is already used for
   foreign mappings and the VMA properties for hypercall buffers and
   foreign mappings might need to be different and the handling of
   unmap will need to be different.  This might be tricky.

3. Add a pair of ioctls to privcmd to lock/unlock a buffer by
   getting/putting an additional reference to the page.  This is
   what's implemented in this series.

Any thoughts on which is the best approach?

David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCHv1 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported
  2016-08-03 17:03 [PATCHv1 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
@ 2016-08-03 17:03 ` David Vrabel
  2016-08-04 14:26   ` Boris Ostrovsky
  2016-08-03 17:03 ` [PATCHv1 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
  1 sibling, 1 reply; 5+ messages in thread
From: David Vrabel @ 2016-08-03 17:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Boris Ostrovsky, David Vrabel

The standard return value for ioctl(2) where the cmd is not supported
is ENOTTY, not EINVAL.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/privcmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 702040f..ac76bc4 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -572,7 +572,7 @@ static long privcmd_ioctl(struct file *file,
 		break;
 
 	default:
-		ret = -EINVAL;
+		ret = -ENOTTY;
 		break;
 	}
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCHv1 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers
  2016-08-03 17:03 [PATCHv1 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
  2016-08-03 17:03 ` [PATCHv1 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
@ 2016-08-03 17:03 ` David Vrabel
  2016-08-04  6:52   ` Jan Beulich
  1 sibling, 1 reply; 5+ messages in thread
From: David Vrabel @ 2016-08-03 17:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Boris Ostrovsky, David Vrabel

Using mlock() for hypercall buffers is not sufficient since mlocked
pages are still subject to compaction and page migration.  Page
migration can be prevented by taking additional references to the
pages.

Introduce two new ioctls: IOCTL_PRIVCMD_HCALL_BUF_LOCK and
IOCTL_PRIVCMD_HCALL_BUF_UNLOCK which get and put the necessary page
references.  The buffers do not need to be page aligned and they may
overlap with other buffers.  However, it is not possible to partially
unlock a buffer (i.e., the LOCK/UNLOCK must always be paired).  Any
locked buffers are automatically unlocked when the file descriptor is
closed.

An alternative approach would be to extend the driver with an ioctl to
populate a VMA with "special", non-migratable pages.  But the
LOCK/UNLOCK ioctls are more flexible as they allow any page to be used
for hypercalls (e.g., stack, mmap'd files, etc.).  This could be used
to minimize bouncing for performance critical hypercalls.

Locked buffers are stored in a rbtree for faster lookup during UNLOCK,
and stored in a list so all buffers can be unlocked when the file
descriptor is closed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/privcmd.c      | 208 +++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/xen/privcmd.h |  37 ++++++++
 2 files changed, 245 insertions(+)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index ac76bc4..7fc9db8 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -22,6 +22,7 @@
 #include <linux/pagemap.h>
 #include <linux/seq_file.h>
 #include <linux/miscdevice.h>
+#include <linux/rbtree.h>
 
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
@@ -43,6 +44,21 @@ MODULE_LICENSE("GPL");
 
 #define PRIV_VMA_LOCKED ((void *)1)
 
+struct privcmd_data {
+	struct rb_root hbuf_root;
+	struct list_head hbuf_list;
+	struct mutex hbuf_mutex;
+};
+
+struct privcmd_hbuf {
+	struct rb_node node;
+	struct list_head release_node;
+	struct privcmd_hcall_buf buf;
+	unsigned int nr_pages;
+	struct page **pages;
+	unsigned int count;
+};
+
 static int privcmd_vma_range_is_mapped(
                struct vm_area_struct *vma,
                unsigned long addr,
@@ -548,9 +564,164 @@ out_unlock:
 	goto out;
 }
 
+static int privcmd_hbuf_compare(struct privcmd_hcall_buf *a,
+				struct privcmd_hcall_buf *b)
+{
+	if (b->start == a->start)
+		return b->len - a->len;
+	return b->start - a->start;
+}
+
+static void privcmd_hbuf_insert(struct privcmd_data *priv,
+				struct privcmd_hbuf *hbuf)
+{
+	struct rb_node **new = &priv->hbuf_root.rb_node, *parent = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct privcmd_hbuf *this = container_of(*new, struct privcmd_hbuf, node);
+		int result = privcmd_hbuf_compare(&hbuf->buf, &this->buf);
+
+		parent = *new;
+		if (result < 0)
+			new = &(*new)->rb_left;
+		else if (result > 0)
+			new = &(*new)->rb_right;
+		else {
+			this->count++;
+			kfree(hbuf->pages);
+			kfree(hbuf);
+			return;
+		}
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&hbuf->node, parent, new);
+	rb_insert_color(&hbuf->node, &priv->hbuf_root);
+
+	list_add_tail(&hbuf->release_node, &priv->hbuf_list);
+
+	hbuf->count = 1;
+}
+
+static struct privcmd_hbuf *privcmd_hbuf_lookup(struct privcmd_data *priv,
+						struct privcmd_hcall_buf *key)
+{
+	struct rb_node *node = priv->hbuf_root.rb_node;
+
+	while (node) {
+		struct privcmd_hbuf *hbuf = container_of(node, struct privcmd_hbuf, node);
+		int result;
+
+		result = privcmd_hbuf_compare(key, &hbuf->buf);
+
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else
+			return hbuf;
+	}
+	return NULL;
+}
+
+static void privcmd_hbuf_unlock(struct privcmd_data *priv,
+				struct privcmd_hbuf *hbuf)
+{
+	unsigned int i;
+
+	for (i = 0; i < hbuf->nr_pages; i++)
+		put_page(hbuf->pages[i]);
+
+	if (--hbuf->count == 0) {
+		rb_erase(&hbuf->node, &priv->hbuf_root);
+		list_del(&hbuf->release_node);
+		kfree(hbuf->pages);
+		kfree(hbuf);
+	}
+}
+
+static int privcmd_ioctl_hcall_buf_lock(struct privcmd_data *priv,
+					void __user *udata)
+{
+	struct privcmd_hbuf *hbuf;
+	unsigned long start, end;
+	int ret;
+
+	hbuf = kzalloc(sizeof(*hbuf), GFP_KERNEL);
+	if (!hbuf)
+		return -ENOMEM;
+
+	if (copy_from_user(&hbuf->buf, udata, sizeof(hbuf->buf))) {
+		ret = -EFAULT;
+		goto error;
+	}
+
+	start = (unsigned long)hbuf->buf.start & PAGE_MASK;
+	end = ALIGN((unsigned long)hbuf->buf.start + hbuf->buf.len, PAGE_SIZE);
+	hbuf->nr_pages = (end - start) / PAGE_SIZE;
+
+	hbuf->pages = kcalloc(hbuf->nr_pages, sizeof(*hbuf->pages), GFP_KERNEL);
+	if (!hbuf->pages) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	/*
+	 * Take a reference to each page, this will prevent swapping
+	 * and page migration.
+	 */
+	ret = get_user_pages_fast(start, hbuf->nr_pages, 1, hbuf->pages);
+	if (ret < 0)
+		goto error;
+
+	mutex_lock(&priv->hbuf_mutex);
+	privcmd_hbuf_insert(priv, hbuf);
+	mutex_unlock(&priv->hbuf_mutex);
+
+	return 0;
+
+  error:
+	kfree(hbuf->pages);
+	kfree(hbuf);
+	return ret;
+}
+
+static int privcmd_ioctl_hcall_buf_unlock(struct privcmd_data *priv,
+					  void __user *udata)
+{
+	struct privcmd_hcall_buf hcall_buf;
+	struct privcmd_hbuf *hbuf;
+
+	if (copy_from_user(&hcall_buf, udata, sizeof(hcall_buf)))
+		return -EFAULT;
+
+	mutex_lock(&priv->hbuf_mutex);
+
+	hbuf = privcmd_hbuf_lookup(priv, &hcall_buf);
+	if (hbuf)
+		privcmd_hbuf_unlock(priv, hbuf);
+
+	mutex_unlock(&priv->hbuf_mutex);
+
+	return 0;
+}
+
+static void privcmd_hbuf_erase_all(struct privcmd_data *priv)
+{
+	while (!list_empty(&priv->hbuf_list)) {
+		struct privcmd_hbuf *hbuf;
+
+		hbuf = list_first_entry(&priv->hbuf_list,
+					struct privcmd_hbuf, release_node);
+		privcmd_hbuf_unlock(priv, hbuf);
+	}
+}
+
 static long privcmd_ioctl(struct file *file,
 			  unsigned int cmd, unsigned long data)
 {
+	struct privcmd_data *priv = file->private_data;
 	int ret = -ENOSYS;
 	void __user *udata = (void __user *) data;
 
@@ -571,6 +742,14 @@ static long privcmd_ioctl(struct file *file,
 		ret = privcmd_ioctl_mmap_batch(udata, 2);
 		break;
 
+	case IOCTL_PRIVCMD_HCALL_BUF_LOCK:
+		ret = privcmd_ioctl_hcall_buf_lock(priv, udata);
+		break;
+
+	case IOCTL_PRIVCMD_HCALL_BUF_UNLOCK:
+		ret = privcmd_ioctl_hcall_buf_unlock(priv, udata);
+		break;
+
 	default:
 		ret = -ENOTTY;
 		break;
@@ -644,8 +823,37 @@ static int privcmd_vma_range_is_mapped(
 				   is_mapped_fn, NULL) != 0;
 }
 
+static int privcmd_open(struct inode *inode, struct file *file)
+{
+	struct privcmd_data *priv;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->hbuf_root = RB_ROOT;
+	INIT_LIST_HEAD(&priv->hbuf_list);
+	mutex_init(&priv->hbuf_mutex);
+
+	file->private_data = priv;
+
+	return 0;
+}
+
+static int privcmd_release(struct inode *inode, struct file *file)
+{
+	struct privcmd_data *priv = file->private_data;
+
+	privcmd_hbuf_erase_all(priv);
+
+	kfree(priv);
+	return 0;
+}
+
 const struct file_operations xen_privcmd_fops = {
 	.owner = THIS_MODULE,
+	.open = privcmd_open,
+	.release = privcmd_release,
 	.unlocked_ioctl = privcmd_ioctl,
 	.mmap = privcmd_mmap,
 };
diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
index 7ddeeda..425bd02 100644
--- a/include/uapi/xen/privcmd.h
+++ b/include/uapi/xen/privcmd.h
@@ -77,6 +77,11 @@ struct privcmd_mmapbatch_v2 {
 	int __user *err;  /* array of error codes */
 };
 
+struct privcmd_hcall_buf {
+	void *start;
+	size_t len;
+};
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -99,4 +104,36 @@ struct privcmd_mmapbatch_v2 {
 #define IOCTL_PRIVCMD_MMAPBATCH_V2				\
 	_IOC(_IOC_NONE, 'P', 4, sizeof(struct privcmd_mmapbatch_v2))
 
+/*
+ * @cmd: IOCTL_PRIVCMD_HCALL_BUF_LOCK
+ * @arg: struct privcmd hcall_buf *
+ * Return: 0 on success. On an error, -1 is returned and errno is set
+ * to EINVAL, ENOMEM, or EFAULT.
+ *
+ * Locks a memory buffer so it may be used in a hypercall.  This is
+ * similar to mlock(2) but also prevents compaction/page migration.
+ *
+ * The buffers may have any alignment and size and may overlap other
+ * buffers.
+ *
+ * Locked buffers are unlocked with IOCTL_PRIVCMD_HCALL_BUF_UNLOCK or
+ * by closing the file handle.
+ */
+#define IOCTL_PRIVCMD_HCALL_BUF_LOCK				\
+	_IOC(_IOC_NONE, 'P', 5, sizeof(struct privcmd_hcall_buf))
+
+/*
+ * @cmd: IOCTL_PRIVCMD_HCALL_BUF_UNLOCK
+ * @arg: struct privcmd hcall_buf *
+ * Return: Always 0.
+ *
+ * Unlocks a memory buffer previously locked with
+ * IOCTL_PRIVCMD_HCALL_BUF_LOCK.
+ *
+ * It is not possible to partially unlock a buffer.  i.e., the
+ * LOCK/UNLOCK must be exactly paired.
+ */
+#define IOCTL_PRIVCMD_HCALL_BUF_UNLOCK				\
+	_IOC(_IOC_NONE, 'P', 6, sizeof(struct privcmd_hcall_buf))
+
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCHv1 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers
  2016-08-03 17:03 ` [PATCHv1 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
@ 2016-08-04  6:52   ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2016-08-04  6:52 UTC (permalink / raw)
  To: David Vrabel; +Cc: Juergen Gross, xen-devel, Boris Ostrovsky

>>> On 03.08.16 at 19:03, <david.vrabel@citrix.com> wrote:
> Locked buffers are stored in a rbtree for faster lookup during UNLOCK,
> and stored in a list so all buffers can be unlocked when the file
> descriptor is closed.

Is there no sufficiently easy way to walk an rbtree, so the duplicate
putting on a list could be avoided?

> +static int privcmd_hbuf_compare(struct privcmd_hcall_buf *a,
> +				struct privcmd_hcall_buf *b)
> +{
> +	if (b->start == a->start)
> +		return b->len - a->len;
> +	return b->start - a->start;

I don't think subtraction can be used for either of these, or truncation
may result on 64-bit arches. Or, since the function only gets called
directly, its return type (and respective variables in callers) would
need to change to long.

> +static void privcmd_hbuf_unlock(struct privcmd_data *priv,
> +				struct privcmd_hbuf *hbuf)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < hbuf->nr_pages; i++)
> +		put_page(hbuf->pages[i]);

Is taking extra references on each lock (and hence dropping them
here on each unlock) really worthwhile, considering that you refcount
the hbuf structures anyway? Taking just a single reference would
not only eliminate no matter how small refcount overflow concerns,
but also ...

> +static void privcmd_hbuf_erase_all(struct privcmd_data *priv)
> +{
> +	while (!list_empty(&priv->hbuf_list)) {
> +		struct privcmd_hbuf *hbuf;
> +
> +		hbuf = list_first_entry(&priv->hbuf_list,
> +					struct privcmd_hbuf, release_node);
> +		privcmd_hbuf_unlock(priv, hbuf);
> +	}
> +}

... decrease the iteration count needed here.


> +	case IOCTL_PRIVCMD_HCALL_BUF_UNLOCK:
> +		ret = privcmd_ioctl_hcall_buf_unlock(priv, udata);

Looking at the function's implementation, this yields success even
if there was no matching structure in the tree. Shouldn't the caller
rather be made aware of the bad request?

> --- a/include/uapi/xen/privcmd.h
> +++ b/include/uapi/xen/privcmd.h
> @@ -77,6 +77,11 @@ struct privcmd_mmapbatch_v2 {
>  	int __user *err;  /* array of error codes */
>  };
>  
> +struct privcmd_hcall_buf {
> +	void *start;

Even if you don't reref this pointer, shouldn't it be __user nevertheless?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCHv1 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported
  2016-08-03 17:03 ` [PATCHv1 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
@ 2016-08-04 14:26   ` Boris Ostrovsky
  0 siblings, 0 replies; 5+ messages in thread
From: Boris Ostrovsky @ 2016-08-04 14:26 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Juergen Gross

On 08/03/2016 01:03 PM, David Vrabel wrote:
> The standard return value for ioctl(2) where the cmd is not supported
> is ENOTTY, not EINVAL.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/xen/privcmd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 702040f..ac76bc4 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -572,7 +572,7 @@ static long privcmd_ioctl(struct file *file,
>  		break;
>  
>  	default:
> -		ret = -EINVAL;
> +		ret = -ENOTTY;
>  		break;
>  	}
>  

evtchn_ioctl() returns -ENOSYS so probably also should be fixed.

In fact, ioctls in drivers/xen/ should be made consistent in what they
do for unknown command.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-08-04 14:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-03 17:03 [PATCHv1 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
2016-08-03 17:03 ` [PATCHv1 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
2016-08-04 14:26   ` Boris Ostrovsky
2016-08-03 17:03 ` [PATCHv1 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
2016-08-04  6:52   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.