All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers
@ 2016-08-04 15:16 David Vrabel
  2016-08-04 15:16 ` [PATCHv2 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: David Vrabel @ 2016-08-04 15:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Boris Ostrovsky, David Vrabel

Currently libxencall using mlocked buffers for hypercall buffers.
This pages are subject to compaction and page migration. A userspace
process may see a hypercall fail with -EFAULT if a page backing a
hypercall buffer is in the process of being migrated.

Page migration can be prevented by taking an additional reference to
the page.

There are three possible ways to do this:

1. Add a new device which when mmap()'d populated the VMA with
   suitable memory that has an additional reference.  This would give
   VMAs that have appropriate properties set (e.g., VM_DONTCOPY)
   without having to do additional madvise() calls.

2. Add a new ioctl to privcmd to populate a VMA with suitably
   ref-counted pages.  However, mmap() on privcmd is already used for
   foreign mappings and the VMA properties for hypercall buffers and
   foreign mappings might need to be different and the handling of
   unmap will need to be different.  This might be tricky.

3. Add a pair of ioctls to privcmd to lock/unlock a buffer by
   getting/putting an additional reference to the page.  This is
   what's implemented in this series.

Any thoughts on which is the best approach?

Changes in v2:
- Addressed Jan's feedback for #2 (see patch for details).

David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCHv2 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported
  2016-08-04 15:16 [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
@ 2016-08-04 15:16 ` David Vrabel
  2016-08-08 10:35   ` Juergen Gross
  2016-08-04 15:16 ` [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2016-08-04 15:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Boris Ostrovsky, David Vrabel

The standard return value for ioctl(2) where the cmd is not supported
is ENOTTY, not EINVAL.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/privcmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 702040f..ac76bc4 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -572,7 +572,7 @@ static long privcmd_ioctl(struct file *file,
 		break;
 
 	default:
-		ret = -EINVAL;
+		ret = -ENOTTY;
 		break;
 	}
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers
  2016-08-04 15:16 [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
  2016-08-04 15:16 ` [PATCHv2 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
@ 2016-08-04 15:16 ` David Vrabel
  2016-08-04 16:02   ` Jan Beulich
  2016-08-22 17:25 ` [PATCHv2 0/2] xen/privcmd: prevent page migration for " David Vrabel
  2016-10-10 13:45 ` David Vrabel
  3 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2016-08-04 15:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Boris Ostrovsky, David Vrabel

Using mlock() for hypercall buffers is not sufficient since mlocked
pages are still subject to compaction and page migration.  Page
migration can be prevented by taking additional references to the
pages.

Introduce two new ioctls: IOCTL_PRIVCMD_HCALL_BUF_LOCK and
IOCTL_PRIVCMD_HCALL_BUF_UNLOCK which get and put the necessary page
references.  The buffers do not need to be page aligned and they may
overlap with other buffers.  However, it is not possible to partially
unlock a buffer (i.e., the LOCK/UNLOCK must always be paired).  Any
locked buffers are automatically unlocked when the file descriptor is
closed.

An alternative approach would be to extend the driver with an ioctl to
populate a VMA with "special", non-migratable pages.  But the
LOCK/UNLOCK ioctls are more flexible as they allow any page to be used
for hypercalls (e.g., stack, mmap'd files, etc.).  This could be used
to minimize bouncing for performance critical hypercalls.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v2:
- Only take one reference per buffer.
- Interate over the tree to unlock/free all buffers.
- Max 1GB size per buffer.
- Only use one memory allocation per buffer.
- Fix overflow issues in privcmd_hbuf_compare().
- Return -ENXIO when unlocking an unlocked buffer.
---
 drivers/xen/privcmd.c      | 216 +++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/xen/privcmd.h |  38 ++++++++
 2 files changed, 254 insertions(+)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index ac76bc4..5419b31 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -22,6 +22,7 @@
 #include <linux/pagemap.h>
 #include <linux/seq_file.h>
 #include <linux/miscdevice.h>
+#include <linux/rbtree.h>
 
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
@@ -43,6 +44,19 @@ MODULE_LICENSE("GPL");
 
 #define PRIV_VMA_LOCKED ((void *)1)
 
+struct privcmd_data {
+	struct rb_root hbuf_root;
+	struct mutex hbuf_mutex;
+};
+
+struct privcmd_hbuf {
+	struct rb_node node;
+	struct privcmd_hcall_buf buf;
+	unsigned int count;
+	unsigned int nr_pages;
+	struct page *pages[0];
+};
+
 static int privcmd_vma_range_is_mapped(
                struct vm_area_struct *vma,
                unsigned long addr,
@@ -548,9 +562,175 @@ out_unlock:
 	goto out;
 }
 
+static void privcmd_hbuf_free(struct privcmd_hbuf *hbuf)
+{
+	unsigned int i;
+
+	for (i = 0; i < hbuf->nr_pages; i++)
+		put_page(hbuf->pages[i]);
+
+	kfree(hbuf);
+}
+
+static struct privcmd_hbuf *privcmd_hbuf_alloc(struct privcmd_hcall_buf *buf)
+{
+	struct privcmd_hbuf *hbuf;
+	unsigned long start, end, nr_pages;
+	int ret;
+
+	start = (unsigned long)buf->start & PAGE_MASK;
+	end = ALIGN((unsigned long)buf->start + buf->len, PAGE_SIZE);
+	nr_pages = (end - start) / PAGE_SIZE;
+
+	hbuf = kzalloc(sizeof(*hbuf) + nr_pages * sizeof(hbuf->pages[0]),
+		       GFP_KERNEL);
+	if (!hbuf)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * Take a reference to each page, this will prevent swapping
+	 * and page migration.
+	 */
+	ret = get_user_pages_fast(start, nr_pages, 1, hbuf->pages);
+	if (ret < 0)
+		goto error;
+
+	hbuf->buf = *buf;
+	hbuf->count = 1;
+	hbuf->nr_pages = ret;
+
+	if (hbuf->nr_pages != nr_pages) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	return hbuf;
+
+  error:
+	privcmd_hbuf_free(hbuf);
+	return ERR_PTR(ret);
+}
+
+static int privcmd_hbuf_compare(struct privcmd_hcall_buf *a,
+				struct privcmd_hcall_buf *b)
+{
+	if (b->start > a->start)
+		return 1;
+	if (b->start < a->start)
+		return -1;
+	if (b->len > a->len)
+		return 1;
+	if (b->len < a->len)
+		return -1;
+	return 0;
+}
+
+static int privcmd_hbuf_insert(struct privcmd_data *priv,
+				struct privcmd_hcall_buf *buf)
+{
+	struct rb_node **new = &priv->hbuf_root.rb_node, *parent = NULL;
+	struct privcmd_hbuf *hbuf;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct privcmd_hbuf *this = container_of(*new, struct privcmd_hbuf, node);
+		int result = privcmd_hbuf_compare(buf, &this->buf);
+
+		parent = *new;
+		if (result < 0)
+			new = &(*new)->rb_left;
+		else if (result > 0)
+			new = &(*new)->rb_right;
+		else {
+			this->count++;
+			return 0;
+		}
+	}
+
+	/* Allocate and insert a new node. */
+	hbuf = privcmd_hbuf_alloc(buf);
+	if (IS_ERR(hbuf))
+		return PTR_ERR(hbuf);
+
+	rb_link_node(&hbuf->node, parent, new);
+	rb_insert_color(&hbuf->node, &priv->hbuf_root);
+
+	return 0;
+}
+
+static int privcmd_hbuf_remove(struct privcmd_data *priv,
+			       struct privcmd_hcall_buf *key)
+{
+	struct rb_node *node = priv->hbuf_root.rb_node;
+
+	while (node) {
+		struct privcmd_hbuf *hbuf = container_of(node, struct privcmd_hbuf, node);
+		int result;
+
+		result = privcmd_hbuf_compare(key, &hbuf->buf);
+
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else {
+			if (--hbuf->count == 0) {
+				rb_erase(&hbuf->node, &priv->hbuf_root);
+				privcmd_hbuf_free(hbuf);
+			}
+			return 0;
+		}
+	}
+	return -ENXIO;
+}
+
+static int privcmd_ioctl_hcall_buf_lock(struct privcmd_data *priv,
+					void __user *udata)
+{
+	struct privcmd_hcall_buf buf;
+	int ret;
+
+	if (copy_from_user(&buf, udata, sizeof(buf)))
+		return -EFAULT;
+
+	if (buf.len > (1 << 30))
+		return -EINVAL;
+
+	mutex_lock(&priv->hbuf_mutex);
+	ret = privcmd_hbuf_insert(priv, &buf);
+	mutex_unlock(&priv->hbuf_mutex);
+
+	return ret;
+}
+
+static int privcmd_ioctl_hcall_buf_unlock(struct privcmd_data *priv,
+					  void __user *udata)
+{
+	struct privcmd_hcall_buf buf;
+	int ret;
+
+	if (copy_from_user(&buf, udata, sizeof(buf)))
+		return -EFAULT;
+
+	mutex_lock(&priv->hbuf_mutex);
+	ret = privcmd_hbuf_remove(priv, &buf);
+	mutex_unlock(&priv->hbuf_mutex);
+
+	return ret;
+}
+
+static void privcmd_hbuf_unlock_all(struct privcmd_data *priv)
+{
+	struct privcmd_hbuf *hbuf, *n;
+
+	rbtree_postorder_for_each_entry_safe(hbuf, n, &priv->hbuf_root, node)
+		privcmd_hbuf_free(hbuf);
+}
+
 static long privcmd_ioctl(struct file *file,
 			  unsigned int cmd, unsigned long data)
 {
+	struct privcmd_data *priv = file->private_data;
 	int ret = -ENOSYS;
 	void __user *udata = (void __user *) data;
 
@@ -571,6 +751,14 @@ static long privcmd_ioctl(struct file *file,
 		ret = privcmd_ioctl_mmap_batch(udata, 2);
 		break;
 
+	case IOCTL_PRIVCMD_HCALL_BUF_LOCK:
+		ret = privcmd_ioctl_hcall_buf_lock(priv, udata);
+		break;
+
+	case IOCTL_PRIVCMD_HCALL_BUF_UNLOCK:
+		ret = privcmd_ioctl_hcall_buf_unlock(priv, udata);
+		break;
+
 	default:
 		ret = -ENOTTY;
 		break;
@@ -644,8 +832,36 @@ static int privcmd_vma_range_is_mapped(
 				   is_mapped_fn, NULL) != 0;
 }
 
+static int privcmd_open(struct inode *inode, struct file *file)
+{
+	struct privcmd_data *priv;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->hbuf_root = RB_ROOT;
+	mutex_init(&priv->hbuf_mutex);
+
+	file->private_data = priv;
+
+	return 0;
+}
+
+static int privcmd_release(struct inode *inode, struct file *file)
+{
+	struct privcmd_data *priv = file->private_data;
+
+	privcmd_hbuf_unlock_all(priv);
+
+	kfree(priv);
+	return 0;
+}
+
 const struct file_operations xen_privcmd_fops = {
 	.owner = THIS_MODULE,
+	.open = privcmd_open,
+	.release = privcmd_release,
 	.unlocked_ioctl = privcmd_ioctl,
 	.mmap = privcmd_mmap,
 };
diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
index 7ddeeda..d59d122 100644
--- a/include/uapi/xen/privcmd.h
+++ b/include/uapi/xen/privcmd.h
@@ -77,6 +77,11 @@ struct privcmd_mmapbatch_v2 {
 	int __user *err;  /* array of error codes */
 };
 
+struct privcmd_hcall_buf {
+	void __user *start;
+	size_t len;
+};
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -99,4 +104,37 @@ struct privcmd_mmapbatch_v2 {
 #define IOCTL_PRIVCMD_MMAPBATCH_V2				\
 	_IOC(_IOC_NONE, 'P', 4, sizeof(struct privcmd_mmapbatch_v2))
 
+/*
+ * @cmd: IOCTL_PRIVCMD_HCALL_BUF_LOCK
+ * @arg: struct privcmd hcall_buf *
+ * Return: 0 on success. On an error, -1 is returned and errno is set
+ * to EINVAL (buffer too large), ENOMEM, or EFAULT.
+ *
+ * Locks a memory buffer so it may be used in a hypercall.  This is
+ * similar to mlock(2) but also prevents compaction/page migration.
+ *
+ * The buffers may have any alignment and any size <= 1 GiB. They may
+ * overlap other buffers.
+ *
+ * Locked buffers are unlocked with IOCTL_PRIVCMD_HCALL_BUF_UNLOCK or
+ * by closing the file handle.
+ */
+#define IOCTL_PRIVCMD_HCALL_BUF_LOCK				\
+	_IOC(_IOC_NONE, 'P', 5, sizeof(struct privcmd_hcall_buf))
+
+/*
+ * @cmd: IOCTL_PRIVCMD_HCALL_BUF_UNLOCK
+ * @arg: struct privcmd hcall_buf *
+ * Return: 0 on success. On an error, -1 is returned and errno is set
+ * to ENXIO (buffer was not locked), or EFAULT.
+ *
+ * Unlocks a memory buffer previously locked with
+ * IOCTL_PRIVCMD_HCALL_BUF_LOCK.
+ *
+ * It is not possible to partially unlock a buffer.  i.e., the
+ * LOCK/UNLOCK must be exactly paired.
+ */
+#define IOCTL_PRIVCMD_HCALL_BUF_UNLOCK				\
+	_IOC(_IOC_NONE, 'P', 6, sizeof(struct privcmd_hcall_buf))
+
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers
  2016-08-04 15:16 ` [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
@ 2016-08-04 16:02   ` Jan Beulich
  2016-08-22 17:24     ` David Vrabel
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-08-04 16:02 UTC (permalink / raw)
  To: David Vrabel; +Cc: Juergen Gross, xen-devel, Boris Ostrovsky

>>> On 04.08.16 at 17:16, <david.vrabel@citrix.com> wrote:
> Using mlock() for hypercall buffers is not sufficient since mlocked
> pages are still subject to compaction and page migration.  Page
> migration can be prevented by taking additional references to the
> pages.
> 
> Introduce two new ioctls: IOCTL_PRIVCMD_HCALL_BUF_LOCK and
> IOCTL_PRIVCMD_HCALL_BUF_UNLOCK which get and put the necessary page
> references.  The buffers do not need to be page aligned and they may
> overlap with other buffers.  However, it is not possible to partially
> unlock a buffer (i.e., the LOCK/UNLOCK must always be paired).  Any
> locked buffers are automatically unlocked when the file descriptor is
> closed.
> 
> An alternative approach would be to extend the driver with an ioctl to
> populate a VMA with "special", non-migratable pages.  But the
> LOCK/UNLOCK ioctls are more flexible as they allow any page to be used
> for hypercalls (e.g., stack, mmap'd files, etc.).  This could be used
> to minimize bouncing for performance critical hypercalls.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

with two remarks: Do you not care about compat mode callers? In
case you indeed mean to not support them, does the default
handling ensure they will see an error instead of you mis-interpreting
(and overrunning) the presented structure?

> +static struct privcmd_hbuf *privcmd_hbuf_alloc(struct privcmd_hcall_buf *buf)
> +{
> +	struct privcmd_hbuf *hbuf;
> +	unsigned long start, end, nr_pages;
> +	int ret;
> +
> +	start = (unsigned long)buf->start & PAGE_MASK;
> +	end = ALIGN((unsigned long)buf->start + buf->len, PAGE_SIZE);
> +	nr_pages = (end - start) / PAGE_SIZE;

So entry re-use is based on the actual length the caller passed,
while page tracking of course is page granular. Wouldn't it make
sense to re-use entries based on the pages they encompass,
which in particular would mean that two distinct buffers on the
same page would get just a single entry?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported
  2016-08-04 15:16 ` [PATCHv2 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
@ 2016-08-08 10:35   ` Juergen Gross
  0 siblings, 0 replies; 11+ messages in thread
From: Juergen Gross @ 2016-08-08 10:35 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky

On 04/08/16 17:16, David Vrabel wrote:
> The standard return value for ioctl(2) where the cmd is not supported
> is ENOTTY, not EINVAL.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers
  2016-08-04 16:02   ` Jan Beulich
@ 2016-08-22 17:24     ` David Vrabel
  2016-08-23  8:46       ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2016-08-22 17:24 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel; +Cc: Juergen Gross, xen-devel, Boris Ostrovsky

On 04/08/16 17:02, Jan Beulich wrote:
>>>> On 04.08.16 at 17:16, <david.vrabel@citrix.com> wrote:
>> Using mlock() for hypercall buffers is not sufficient since mlocked
>> pages are still subject to compaction and page migration.  Page
>> migration can be prevented by taking additional references to the
>> pages.
>>
>> Introduce two new ioctls: IOCTL_PRIVCMD_HCALL_BUF_LOCK and
>> IOCTL_PRIVCMD_HCALL_BUF_UNLOCK which get and put the necessary page
>> references.  The buffers do not need to be page aligned and they may
>> overlap with other buffers.  However, it is not possible to partially
>> unlock a buffer (i.e., the LOCK/UNLOCK must always be paired).  Any
>> locked buffers are automatically unlocked when the file descriptor is
>> closed.
>>
>> An alternative approach would be to extend the driver with an ioctl to
>> populate a VMA with "special", non-migratable pages.  But the
>> LOCK/UNLOCK ioctls are more flexible as they allow any page to be used
>> for hypercalls (e.g., stack, mmap'd files, etc.).  This could be used
>> to minimize bouncing for performance critical hypercalls.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> with two remarks: Do you not care about compat mode callers?

No.  Compat mode callers can't make hypercalls since the hypervisor ABI
structures aren't converted anywhere.

> case you indeed mean to not support them, does the default
> handling ensure they will see an error instead of you mis-interpreting
> (and overrunning) the presented structure?

I will look at this.

>> +static struct privcmd_hbuf *privcmd_hbuf_alloc(struct privcmd_hcall_buf *buf)
>> +{
>> +	struct privcmd_hbuf *hbuf;
>> +	unsigned long start, end, nr_pages;
>> +	int ret;
>> +
>> +	start = (unsigned long)buf->start & PAGE_MASK;
>> +	end = ALIGN((unsigned long)buf->start + buf->len, PAGE_SIZE);
>> +	nr_pages = (end - start) / PAGE_SIZE;
> 
> So entry re-use is based on the actual length the caller passed,
> while page tracking of course is page granular. Wouldn't it make
> sense to re-use entries based on the pages they encompass,
> which in particular would mean that two distinct buffers on the
> same page would get just a single entry?

If you make the entries page aligned, you can't give always give an
error when the user unlocks something of the wrong size.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers
  2016-08-04 15:16 [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
  2016-08-04 15:16 ` [PATCHv2 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
  2016-08-04 15:16 ` [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
@ 2016-08-22 17:25 ` David Vrabel
  2016-08-25  5:41   ` Juergen Gross
  2016-10-10 13:45 ` David Vrabel
  3 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2016-08-22 17:25 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Juergen Gross, Boris Ostrovsky

On 04/08/16 16:16, David Vrabel wrote:
> Currently libxencall using mlocked buffers for hypercall buffers.
> This pages are subject to compaction and page migration. A userspace
> process may see a hypercall fail with -EFAULT if a page backing a
> hypercall buffer is in the process of being migrated.
> 
> Page migration can be prevented by taking an additional reference to
> the page.
> 
> There are three possible ways to do this:
> 
> 1. Add a new device which when mmap()'d populated the VMA with
>    suitable memory that has an additional reference.  This would give
>    VMAs that have appropriate properties set (e.g., VM_DONTCOPY)
>    without having to do additional madvise() calls.
> 
> 2. Add a new ioctl to privcmd to populate a VMA with suitably
>    ref-counted pages.  However, mmap() on privcmd is already used for
>    foreign mappings and the VMA properties for hypercall buffers and
>    foreign mappings might need to be different and the handling of
>    unmap will need to be different.  This might be tricky.
> 
> 3. Add a pair of ioctls to privcmd to lock/unlock a buffer by
>    getting/putting an additional reference to the page.  This is
>    what's implemented in this series.
> 
> Any thoughts on which is the best approach?

Did no one have an opinions on these three options?

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers
  2016-08-22 17:24     ` David Vrabel
@ 2016-08-23  8:46       ` Jan Beulich
  2016-08-23  8:54         ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-08-23  8:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: Juergen Gross, xen-devel, BorisOstrovsky

>>> On 22.08.16 at 19:24, <david.vrabel@citrix.com> wrote:
> On 04/08/16 17:02, Jan Beulich wrote:
>>>>> On 04.08.16 at 17:16, <david.vrabel@citrix.com> wrote:
>>> Using mlock() for hypercall buffers is not sufficient since mlocked
>>> pages are still subject to compaction and page migration.  Page
>>> migration can be prevented by taking additional references to the
>>> pages.
>>>
>>> Introduce two new ioctls: IOCTL_PRIVCMD_HCALL_BUF_LOCK and
>>> IOCTL_PRIVCMD_HCALL_BUF_UNLOCK which get and put the necessary page
>>> references.  The buffers do not need to be page aligned and they may
>>> overlap with other buffers.  However, it is not possible to partially
>>> unlock a buffer (i.e., the LOCK/UNLOCK must always be paired).  Any
>>> locked buffers are automatically unlocked when the file descriptor is
>>> closed.
>>>
>>> An alternative approach would be to extend the driver with an ioctl to
>>> populate a VMA with "special", non-migratable pages.  But the
>>> LOCK/UNLOCK ioctls are more flexible as they allow any page to be used
>>> for hypercalls (e.g., stack, mmap'd files, etc.).  This could be used
>>> to minimize bouncing for performance critical hypercalls.
>>>
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> 
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> 
>> with two remarks: Do you not care about compat mode callers?
> 
> No.  Compat mode callers can't make hypercalls since the hypervisor ABI
> structures aren't converted anywhere.

Hmm - is putting a 64-bit kernel underneath a 32-bit installation
something that works for all other purposes, but is meant to
explicitly not work with Xen? The domctl/sysctl model (which the
hvmctl series follows, since the earlier hvmop already has this
requirement) already requires no translation.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers
  2016-08-23  8:46       ` Jan Beulich
@ 2016-08-23  8:54         ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2016-08-23  8:54 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel; +Cc: Juergen Gross, xen-devel, BorisOstrovsky

On 23/08/16 09:46, Jan Beulich wrote:
>>>> On 22.08.16 at 19:24, <david.vrabel@citrix.com> wrote:
>> On 04/08/16 17:02, Jan Beulich wrote:
>>>>>> On 04.08.16 at 17:16, <david.vrabel@citrix.com> wrote:
>>>> Using mlock() for hypercall buffers is not sufficient since mlocked
>>>> pages are still subject to compaction and page migration.  Page
>>>> migration can be prevented by taking additional references to the
>>>> pages.
>>>>
>>>> Introduce two new ioctls: IOCTL_PRIVCMD_HCALL_BUF_LOCK and
>>>> IOCTL_PRIVCMD_HCALL_BUF_UNLOCK which get and put the necessary page
>>>> references.  The buffers do not need to be page aligned and they may
>>>> overlap with other buffers.  However, it is not possible to partially
>>>> unlock a buffer (i.e., the LOCK/UNLOCK must always be paired).  Any
>>>> locked buffers are automatically unlocked when the file descriptor is
>>>> closed.
>>>>
>>>> An alternative approach would be to extend the driver with an ioctl to
>>>> populate a VMA with "special", non-migratable pages.  But the
>>>> LOCK/UNLOCK ioctls are more flexible as they allow any page to be used
>>>> for hypercalls (e.g., stack, mmap'd files, etc.).  This could be used
>>>> to minimize bouncing for performance critical hypercalls.
>>>>
>>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> with two remarks: Do you not care about compat mode callers?
>> No.  Compat mode callers can't make hypercalls since the hypervisor ABI
>> structures aren't converted anywhere.
> Hmm - is putting a 64-bit kernel underneath a 32-bit installation
> something that works for all other purposes, but is meant to
> explicitly not work with Xen? The domctl/sysctl model (which the
> hvmctl series follows, since the earlier hvmop already has this
> requirement) already requires no translation.

There were some very hacky patches around 5 years ago which allowed 64
bit PV guests to execute the compat hypercall entry in Xen, for the
purpose of making 32bit userspace work in a 64bit guest under 64bit Xen.

The modifications required bypassing the hypercall page in the guest
kernel to execute `int $0x82` rather than `syscall`, and modifications
in Xen to follow the hypercall path rather than bounce `int $0x82` back
into the guest.

As for now, this is not something which functions in the slightest.  The
bitness of hypercall-aware userspace must match the bitness of the guest
kernel it is running under.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers
  2016-08-22 17:25 ` [PATCHv2 0/2] xen/privcmd: prevent page migration for " David Vrabel
@ 2016-08-25  5:41   ` Juergen Gross
  0 siblings, 0 replies; 11+ messages in thread
From: Juergen Gross @ 2016-08-25  5:41 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky

On 22/08/16 19:25, David Vrabel wrote:
> On 04/08/16 16:16, David Vrabel wrote:
>> Currently libxencall using mlocked buffers for hypercall buffers.
>> This pages are subject to compaction and page migration. A userspace
>> process may see a hypercall fail with -EFAULT if a page backing a
>> hypercall buffer is in the process of being migrated.
>>
>> Page migration can be prevented by taking an additional reference to
>> the page.
>>
>> There are three possible ways to do this:
>>
>> 1. Add a new device which when mmap()'d populated the VMA with
>>    suitable memory that has an additional reference.  This would give
>>    VMAs that have appropriate properties set (e.g., VM_DONTCOPY)
>>    without having to do additional madvise() calls.
>>
>> 2. Add a new ioctl to privcmd to populate a VMA with suitably
>>    ref-counted pages.  However, mmap() on privcmd is already used for
>>    foreign mappings and the VMA properties for hypercall buffers and
>>    foreign mappings might need to be different and the handling of
>>    unmap will need to be different.  This might be tricky.
>>
>> 3. Add a pair of ioctls to privcmd to lock/unlock a buffer by
>>    getting/putting an additional reference to the page.  This is
>>    what's implemented in this series.
>>
>> Any thoughts on which is the best approach?
> 
> Did no one have an opinions on these three options?

I think 3 is the best solution.

So for this series:

Acked-by: Juergen Gross <jgross@suse.com>


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers
  2016-08-04 15:16 [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
                   ` (2 preceding siblings ...)
  2016-08-22 17:25 ` [PATCHv2 0/2] xen/privcmd: prevent page migration for " David Vrabel
@ 2016-10-10 13:45 ` David Vrabel
  3 siblings, 0 replies; 11+ messages in thread
From: David Vrabel @ 2016-10-10 13:45 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Juergen Gross, Boris Ostrovsky

On 04/08/16 16:16, David Vrabel wrote:
> Currently libxencall using mlocked buffers for hypercall buffers.
> This pages are subject to compaction and page migration. A userspace
> process may see a hypercall fail with -EFAULT if a page backing a
> hypercall buffer is in the process of being migrated.
> 
> Page migration can be prevented by taking an additional reference to
> the page.
> 
> There are three possible ways to do this:
> 
> 1. Add a new device which when mmap()'d populated the VMA with
>    suitable memory that has an additional reference.  This would give
>    VMAs that have appropriate properties set (e.g., VM_DONTCOPY)
>    without having to do additional madvise() calls.
> 
> 2. Add a new ioctl to privcmd to populate a VMA with suitably
>    ref-counted pages.  However, mmap() on privcmd is already used for
>    foreign mappings and the VMA properties for hypercall buffers and
>    foreign mappings might need to be different and the handling of
>    unmap will need to be different.  This might be tricky.
> 
> 3. Add a pair of ioctls to privcmd to lock/unlock a buffer by
>    getting/putting an additional reference to the page.  This is
>    what's implemented in this series.

This method appears to be insufficient -- I'm still seeing very
occasional, spurious -EFAULT errors.

The easiest workaround is to set vm.compact_unevictable_allowed = 0.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-10 13:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04 15:16 [PATCHv2 0/2] xen/privcmd: prevent page migration for hypercall buffers David Vrabel
2016-08-04 15:16 ` [PATCHv2 1/2] xen/prvicmd: use ENOTTY if the IOCTL is not supported David Vrabel
2016-08-08 10:35   ` Juergen Gross
2016-08-04 15:16 ` [PATCHv2 2/2] xen/privcmd: add ioctls for locking/unlocking hypercall buffers David Vrabel
2016-08-04 16:02   ` Jan Beulich
2016-08-22 17:24     ` David Vrabel
2016-08-23  8:46       ` Jan Beulich
2016-08-23  8:54         ` Andrew Cooper
2016-08-22 17:25 ` [PATCHv2 0/2] xen/privcmd: prevent page migration for " David Vrabel
2016-08-25  5:41   ` Juergen Gross
2016-10-10 13:45 ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.